One such algorithm would be a (generalized) Ukkonnen suffix tree 
(https://en.m.wikipedia.org/wiki/Ukkonen%27s_algorithm). 
It allows you to search efficiently for substrings. 
It would be possible to do some match weigthing based on match distance within 
words. But a general solution for a database is probably not trivial to 
implement.

Ben

Von meinem iPad gesendet

> Am 07.01.2016 um 21:46 schrieb Matthias-Christian Ott <ott at mirix.org>:
> 
>> On 2016-01-07 19:31, Mario M. Westphal wrote:
>> I hence wonder if this problem has been tackled already and if there is a
>> "standard" solution.
> 
> If I understand you correctly, it seems that you are looking for a
> compound splitting or decompounding algorithm. Unfortunately there is
> not a "standard solution" for this. There are many languages in the
> world and for some usable compound splitting algorithms exist. There are
> also attempts to create statistical universal algorithms.
> 
> As you said, for English a simple sub-string search might suffice but
> for other languages it more complex. I assume that you speak German. If
> you have a document that contains the term "Verkehrsleitsystem" and your
> search query is "Verkehr leiten", it's reasonable to assume that the
> document is relevant to the search query. Unfortunately a sub-string
> search could not find the document. Other languages are even more
> difficult (a textbook on linguistics will explain this better than I can).
> 
> Even if you have such algorithm, it's not trivial to score the results
> and there are more aspects to consider to create a simple search
> algorithm. For example, in English you will also have to do some
> analysis of the phrase structure to identify open compounds.
> 
> Perhaps it helps to mention the languages you are interested in and the
> application you have in mind to evaluate whether the SQLite FTS5 could
> meet your requirements.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to