Re: [sqlite] FTS and postfix search

Olaf Schmidt Thu, 06 Aug 2009 11:01:32 -0700

"Hugh Sasse" <h...@dmu.ac.uk> schrieb im
Newsbeitrag
news:alpine.lfd.2.00.0908061712390.30...@trueman.cs.cse.dmu.ac.uk...


> I wonder if the idea of suffix arrays would belp.
> http://www.cs.dartmouth.edu/~doug/sarray/

I was about suggesting basically the same thing...

In case of Lukas' topics-table, one would have to
create an additional (search)table 'topics_suffixes',
where each word from within 'topics' is "expanded",
resulting in multiple record-entries.

In case the table 'topics' contains the word
'Motor' (with an ID of e.g. 12345) - the 'topics_suffixes'-
table should get the following insertions:
topic_id   |   word_suffixes
  12345   |   motor
  12345   |   otor
  12345   |   tor
  12345   |   or

In the above sequence the listing is stopped at a "maximum-
two-chars"-suffix (sparing us the storage or the last, singlechar),
to safe a bit of space, since single-char Like-queries, formulated
in "contains-format" (as e.g. ... word_suffixes Like '%t%' ...)
are probably not that interesting regarding their usual larger
recordcount-output (from the users point of view in such
"live-search-scenarios").

But all these "contains-queries", searching for wordparts,
larger than one single char can now be performed with
larger speed against the (agreed much larger) 'topics_suffixes'
table using:
Select Distinct topic_id Where word_suffixes Like 'somepart%'
(with a proper index on word_suffixes) - maybe combined in
a Join to table topics, depending on the Apps implementation.

The size of the topics_suffixes-table (and its index on
word_suffixes) dependent on the average-wordlenght
in topics of course - it's the usual tradeoff between
"used space" and speed.

Olaf




_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] FTS and postfix search

Reply via email to