I

On 23/08/07, Russell Leighton <[EMAIL PROTECTED]> wrote:
>
>
> Could fts3 (the next fts) have the option to override the default
> 'match' function with one passed in (similar to the tokenizer)?
>
> The reason I ask is then the fts table could be used as smart index
> when the tokenizer is
> something like bigram, trigram, etc. and the 'match' function computes
> a similarity metric
> and returns the row if above a threshold.
>
> Postgres does this when you declare an index of type trigram, see:
>
>         http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
>
> Since SQLite does not allow 'plug-in' indexes, the idea would be to
> create an fts3 table with a key back to the main table and the string
> column you want index.
> Indexing becomes a join through the fts3 table.
>
> You would probably want to allow the user to pass args to the 'match'
> function so a threshold could be set to non-default values and maybe
> tweak matching options
> specific to the match and tokenization.
>
> Thoughts?


I think this idea is great... If the ft3 has this optionality i could
rewrite the match function, I like the idea to give the possibility that
users can training  with data, and in database is where most data  are
store, and usually by categories, tags, or other system.

My goal is to give a set of data for learn, then in new inserts assign the
correct or closest category. And another feature that I want is that it
could learn about its mistakes (human assisted)

On Aug 23, 2007, at 4:56 PM, Scott Hess wrote:
>
> > On 8/20/07, Cesar D. Rodas <[EMAIL PROTECTED]> wrote:
> >> As I know ( I can be wrong ) SQLite Full Text Search is only match
> >> with hole
> >> words right? It could not be
> >> And also no FT extension to db ( as far I know) is miss spell
> >> tolerant,
> >
> > Yes, fts is matching exactly.  There is some primitive support for
> > English stemming using the Porter stemmer, but, honestly, it's not
> > well-exercised.
> >
> >> And
> >> I've found this Paper that talks about *Using Superimposed Coding Of
> >> N-Gram
> >> Lists For Efficient Inexact Matching*
> >
> > http://citeseer.ist.psu.edu/cache/papers/cs/22812/http:
> > zSzzSzwww.novodynamics.comzSztrenklezSzpaperszSzatc92v.pdf/
> > william92using.pdf
> >>
> >> I was reading and it is not so hard to implement, but it cost a extra
> >> storage space, but I think the benefits are more.
> >>
> >> Also following this paper could be done a way to match with fragments
> >> of
> >> words... what do you think of it?
> >
> > It's an interesting paper, and I must say that anything which involves
> > Bloom Filters automatically draws my attention :-).
> >
> > While I think spelling-suggestion might be valuable for fts in the
> > longer term, I'm not very enthusiastic about this particular model.
> > It seems much more useful in the standard indexing model of building
> > the index, manually tweaking it, and then doing a ton of queries
> > against it.  fts is really fairly constrained, because many use-cases
> > are more along the lines of update the index quite a bit, and query it
> > only a few times.
> >
> > Also, I think the concepts in the paper might have very significant
> > problems handling Unicode, because the bit vectors will get so very
> > large.  I may be wrong, sometimes the overlapping-vector approach can
> > have surprising relevance depending on the frequency distribution of
> > the things in the vector.  It would need some experimentation to
> > figure that out.
> >
> > Certainly something to bookmark, though.
> >
> > Thanks,
> > scott
> >
> > -----------------------------------------------------------------------
> > ------
> > To unsubscribe, send email to [EMAIL PROTECTED]
> > -----------------------------------------------------------------------
> > ------
> >
>
>
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
>
> -----------------------------------------------------------------------------
>
>


-- 
Cesar D. Rodas
http://www.cesarodas.com/
Mobile Phone: 595 961 974165
Phone: 595 21 645590
[EMAIL PROTECTED]
[EMAIL PROTECTED]

Reply via email to