Re: [sqlite] FTS2 suggestion

Scott Hess Wed, 29 Aug 2007 10:36:16 -0700

What was fts3 will now be fts4.  fts3 will now be
fts2-with-rowid-fixed.  fts3 is already in the tree, but with an
#error at the top to force people to not use it without reading a
comment.  I was planning to turn that off this week (what with the
SQLite 3.5 stuff going on, might as well!).


The next generation of fts has been 6 weeks out for ... the entire
year.  Sigh.  At this time it's my highest priority, though, and I'm
not really supposed to be working on anything else, so I'm hopeful
that there will be substantial code checked in by the end of
September.  Going by the fts2 experience, it will probably need 2 or 3
weeks beyond that to really settle into a usable state.

-scott


On 8/29/07, brian kruse <[EMAIL PROTECTED]> wrote:
> On 8/24/07, Scott Hess <[EMAIL PROTECTED]> wrote:
> >
> > My current focus for the next generation is international support
> > (this is more of a Google Gears project, but with focus on SQLite so
> > there is likely to be stuff checked in on the SQLite side), and more
> > scalable/manageable indexing.
>
> Thanks for the update Scott. Given that FTS3 will also presumably fix
> the VACUUM FTS1/2 bug, do you have a timeline for the FTS3 release?
>
> Even a estimate with a +/- 30 day granularity would be nice.
>
> Kind regards,
> -B
>
> Not a lot of focus on things like
> > quality and recall, mostly because I'm not aware of any major users
> > with enough of an installed baseline to even generate decent metrics.
> > [Basically, solving concrete identified problems rather than looking
> > for ill-defined potential problems.]
> >
> > -scott
> >
> >
> > On 8/24/07, Uma Krishnan <[EMAIL PROTECTED]> wrote:
> > > Would it not be more useful to first implement potter stemmer algorithm, 
> > > and then to implement n-gram (as I understand n-gram is for cross column 
> > > fuzzy search?). What is the general game plan for FTS3 with regard to 
> > > fuzzy search?
> > >
> > >   Thanks in advance
> > >
> > > "Cesar D. Rodas" <[EMAIL PROTECTED]> wrote:
> > >   On 23/08/07, Scott Hess wrote:
> > > > On 8/20/07, Cesar D. Rodas wrote:
> > > > > As I know ( I can be wrong ) SQLite Full Text Search is only match 
> > > > > with hole
> > > > > words right? It could not be
> > > > > And also no FT extension to db ( as far I know) is miss spell 
> > > > > tolerant,
> > > >
> > > > Yes, fts is matching exactly. There is some primitive support for
> > > > English stemming using the Porter stemmer, but, honestly, it's not
> > > > well-exercised.
> > > >
> > > > > And
> > > > > I've found this Paper that talks about *Using Superimposed Coding Of 
> > > > > N-Gram
> > > > > Lists For Efficient Inexact Matching*
> > > >
> > > > http://citeseer.ist.psu.edu/cache/papers/cs/22812/http:zSzzSzwww.novodynamics.comzSztrenklezSzpaperszSzatc92v.pdf/william92using.pdf
> > > > >
> > > > > I was reading and it is not so hard to implement, but it cost a extra
> > > > > storage space, but I think the benefits are more.
> > > > >
> > > > > Also following this paper could be done a way to match with fragments 
> > > > > of
> > > > > words... what do you think of it?
> > > >
> > > > It's an interesting paper, and I must say that anything which involves
> > > > Bloom Filters automatically draws my attention :-).
> > >
> > > Yeah. I am doing some investigations about that, I love that too. And
> > > I was watching that with n-grams you get a filter to stop common
> > > words, and could be used as a stemming-like algorithm but independent
> > > from the language.
> > >
> > > I was thinking to implement this
> > > http://www.mail-archive.com/sqlite-users%40sqlite.org/msg26923.html
> > > when I finish up some things. What do you think of it?
> > >
> > > > While I think spelling-suggestion might be valuable for fts in the
> > > > longer term, I'm not very enthusiastic about this particular model.
> > > > It seems much more useful in the standard indexing model of building
> > > > the index, manually tweaking it, and then doing a ton of queries
> > > > against it. fts is really fairly constrained, because many use-cases
> > > > are more along the lines of update the index quite a bit, and query it
> > > > only a few times.
> > > >
> > > > Also, I think the concepts in the paper might have very significant
> > > > problems handling Unicode, because the bit vectors will get so very
> > > > large. I may be wrong, sometimes the overlapping-vector approach can
> > > > have surprising relevance depending on the frequency distribution of
> > > > the things in the vector. It would need some experimentation to
> > > > figure that out.
> > > >
> > > > Certainly something to bookmark, though.
> > > >
> > > > Thanks,
> > > > scott
> > > >
> > > > -----------------------------------------------------------------------------
> > > > To unsubscribe, send email to [EMAIL PROTECTED]
> > > > -----------------------------------------------------------------------------
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Cesar D. Rodas
> > > http://www.cesarodas.com/
> > > Mobile Phone: 595 961 974165
> > > Phone: 595 21 645590
> > > [EMAIL PROTECTED]
> > > [EMAIL PROTECTED]
> > >
> > > -----------------------------------------------------------------------------
> > > To unsubscribe, send email to [EMAIL PROTECTED]
> > > -----------------------------------------------------------------------------
> > >
> > >
> > >
> >
> > -----------------------------------------------------------------------------
> > To unsubscribe, send email to [EMAIL PROTECTED]
> > -----------------------------------------------------------------------------
> >
> >
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
> -----------------------------------------------------------------------------
>
>

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] FTS2 suggestion

Reply via email to