It's all interesting, but categorization is hard. Not so hard to get some results, sort of hard to get quality results. Might work as a nice adjunct to fts, so that you can throw the search terms into the categorization engine and put up suggestions for re-running the search with a tighter focus.
-scott On 8/23/07, Cesar D. Rodas <[EMAIL PROTECTED]> wrote: > On 23/08/07, Scott Hess <[EMAIL PROTECTED]> wrote: > > On 8/20/07, Cesar D. Rodas <[EMAIL PROTECTED]> wrote: > > > As I know ( I can be wrong ) SQLite Full Text Search is only match with > > > hole > > > words right? It could not be > > > And also no FT extension to db ( as far I know) is miss spell tolerant, > > > > Yes, fts is matching exactly. There is some primitive support for > > English stemming using the Porter stemmer, but, honestly, it's not > > well-exercised. > > > > > And > > > I've found this Paper that talks about *Using Superimposed Coding Of > > > N-Gram > > > Lists For Efficient Inexact Matching* > > > > http://citeseer.ist.psu.edu/cache/papers/cs/22812/http:zSzzSzwww.novodynamics.comzSztrenklezSzpaperszSzatc92v.pdf/william92using.pdf > > > > > > I was reading and it is not so hard to implement, but it cost a extra > > > storage space, but I think the benefits are more. > > > > > > Also following this paper could be done a way to match with fragments of > > > words... what do you think of it? > > > > It's an interesting paper, and I must say that anything which involves > > Bloom Filters automatically draws my attention :-). > > Yeah. I am doing some investigations about that, I love that too. And > I was watching that with n-grams you get a filter to stop common > words, and could be used as a stemming-like algorithm but independent > from the language. > > I was thinking to implement this > http://www.mail-archive.com/sqlite-users%40sqlite.org/msg26923.html > when I finish up some things. What do you think of it? > > > While I think spelling-suggestion might be valuable for fts in the > > longer term, I'm not very enthusiastic about this particular model. > > It seems much more useful in the standard indexing model of building > > the index, manually tweaking it, and then doing a ton of queries > > against it. fts is really fairly constrained, because many use-cases > > are more along the lines of update the index quite a bit, and query it > > only a few times. > > > > Also, I think the concepts in the paper might have very significant > > problems handling Unicode, because the bit vectors will get so very > > large. I may be wrong, sometimes the overlapping-vector approach can > > have surprising relevance depending on the frequency distribution of > > the things in the vector. It would need some experimentation to > > figure that out. > > > > Certainly something to bookmark, though. > > > > Thanks, > > scott > > > > ----------------------------------------------------------------------------- > > To unsubscribe, send email to [EMAIL PROTECTED] > > ----------------------------------------------------------------------------- > > > > > > > > -- > Cesar D. Rodas > http://www.cesarodas.com/ > Mobile Phone: 595 961 974165 > Phone: 595 21 645590 > [EMAIL PROTECTED] > [EMAIL PROTECTED] > > ----------------------------------------------------------------------------- > To unsubscribe, send email to [EMAIL PROTECTED] > ----------------------------------------------------------------------------- > > ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------