Hi,
Thank you anybody for your replies and ideas to "FTS and postfix
search". I thought a lot about it and came to the conclusion: In general
it is not necessary for a fulltext system not find subwords. If it would
be, then I either need no index (search through whole data) or put
subwords into the index too.
So if my documents would be English I would be perfectly finished, even
better with the PORTER-Tokenizer.
But unfortunately the language is German and there are words consisting
of of other words (e.g. Telefonkabel = telephone cable). It is still a
requirement finding the "Telefonkabel" as well when searching for
"Kabel". Does anybody have an idea what would be the best approach? In
my opinion, I have no chance except to split these words with a
predefined dictionary (e.g. {"Telefonkabel"} will become {"telefon",
"kabel", "telefonkabel"}. Even this is a challenge (the index-generation
should not take too long). My idea now would be to extend the FTS in
some way to
a) Support splitting words with predefined dictonary
b) maybe support for non-english (german) versions of the Porter
Stemming algorithm.
I have programming experience with C and C++ but no idea of SQLite.
Where to begin? How easy would it be to implement this and how much time
would it take?
I also found [1]. This indexer seems to be more powerful than the
builtin FTS. However, I can't find support for word-splitting too. Does
anybody have experience with that indexer? Would it be simpler to extent
this indexer? Maybe someone have already tested both...on which should I
concentrate, which one is faster?
Thank you again all,
Luke
[1] http://ft3.sourceforge.net/
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users