Forgot to add. I got idea for last method from this[1] post. [1] http://sqlite.1065341.n5.nabble.com/fast-string-prefix-matching-td10777.html
On Thu, Aug 2, 2012 at 11:36 PM, Navaneeth.K.N <navaneet...@gmail.com>wrote: > Hello, > > I have a table which contains a set of words. These are stored in a table > called 'patterns_content' with pattern as unique key. There is a FTS4 table > named 'patterns' which has 'content=patterns_content' set. It will have > only ASCII characters. Something like, > > ab > abd > abcd > ..... > ..... > > Given a string, 'abcdef', I'd like to tokenize it based on the above > table. My tokenizer is greedy and always looks for longest prefix match. > Which means, > > Eg1: tokenize('abcdef') = 'abcd' + tokenize(ef) > Eg2: tokenize('xyzabdab') = tokenize(xyx) + 'abd' + 'ab' > > I have tried all these methods to find the longest prefix match. > > Method1 - select * from patterns where pattern match 'abcdef OR abcde OR > abcd OR abc OR ab OR a' order by length(pattern) desc limit 1 > > Method2 - select * from patterns_content where pattern = 'abcdef' OR > pattern = 'abcde' OR pattern = 'abcd' OR pattern = 'abc' OR pattern = 'ab' > OR pattern = 'a' order by length(pattern) desc limit 1 > > Method3: Start reading from left and read one character at a time. Each > run, execute select 1 from patterns_content as c where c.pattern >= 'a' and > c.pattern <= 'a' || 'z' limit 1. Next run, it will test for 'ab', then > 'abc' etc. > > All the above methods works well. Surprisingly, the last method > outperforms all other methods. I am wondering what optimization is making > the last query always execute fast? Is it reliable to assume that the query > will have it performance even with huge number of rows? > > I am also wondering if there are other ways to solve this problem in an > efficient way? Any help would be great. My table will have more than a > million of data. > > -- > -Navaneeth > -- -n _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users