Would it not be more useful to first implement potter stemmer algorithm, and 
then to implement n-gram (as I understand n-gram is for cross column fuzzy 
search?). What is the general game plan for FTS3 with regard to fuzzy search?
   
  Thanks in advance

"Cesar D. Rodas" <[EMAIL PROTECTED]> wrote:
  On 23/08/07, Scott Hess wrote:
> On 8/20/07, Cesar D. Rodas wrote:
> > As I know ( I can be wrong ) SQLite Full Text Search is only match with hole
> > words right? It could not be
> > And also no FT extension to db ( as far I know) is miss spell tolerant,
>
> Yes, fts is matching exactly. There is some primitive support for
> English stemming using the Porter stemmer, but, honestly, it's not
> well-exercised.
>
> > And
> > I've found this Paper that talks about *Using Superimposed Coding Of N-Gram
> > Lists For Efficient Inexact Matching*
>
> http://citeseer.ist.psu.edu/cache/papers/cs/22812/http:zSzzSzwww.novodynamics.comzSztrenklezSzpaperszSzatc92v.pdf/william92using.pdf
> >
> > I was reading and it is not so hard to implement, but it cost a extra
> > storage space, but I think the benefits are more.
> >
> > Also following this paper could be done a way to match with fragments of
> > words... what do you think of it?
>
> It's an interesting paper, and I must say that anything which involves
> Bloom Filters automatically draws my attention :-).

Yeah. I am doing some investigations about that, I love that too. And
I was watching that with n-grams you get a filter to stop common
words, and could be used as a stemming-like algorithm but independent
from the language.

I was thinking to implement this
http://www.mail-archive.com/sqlite-users%40sqlite.org/msg26923.html
when I finish up some things. What do you think of it?

> While I think spelling-suggestion might be valuable for fts in the
> longer term, I'm not very enthusiastic about this particular model.
> It seems much more useful in the standard indexing model of building
> the index, manually tweaking it, and then doing a ton of queries
> against it. fts is really fairly constrained, because many use-cases
> are more along the lines of update the index quite a bit, and query it
> only a few times.
>
> Also, I think the concepts in the paper might have very significant
> problems handling Unicode, because the bit vectors will get so very
> large. I may be wrong, sometimes the overlapping-vector approach can
> have surprising relevance depending on the frequency distribution of
> the things in the vector. It would need some experimentation to
> figure that out.
>
> Certainly something to bookmark, though.
>
> Thanks,
> scott
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
> -----------------------------------------------------------------------------
>
>



-- 
Cesar D. Rodas
http://www.cesarodas.com/
Mobile Phone: 595 961 974165
Phone: 595 21 645590
[EMAIL PROTECTED]
[EMAIL PROTECTED]

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------


Reply via email to