Re: [sqlite] How is this query so fast?

2012-08-02 Thread Navaneeth.K.N
Forgot to add. I got idea for last method from this[1] post.

[1]
http://sqlite.1065341.n5.nabble.com/fast-string-prefix-matching-td10777.html


On Thu, Aug 2, 2012 at 11:36 PM, Navaneeth.K.N wrote:

> Hello,
>
> I have a table which contains a set of words. These are stored in a table
> called 'patterns_content' with pattern as unique key. There is a FTS4 table
> named 'patterns' which has 'content=patterns_content' set. It will have
> only ASCII characters. Something like,
>
> ab
> abd
> abcd
> .
> .
>
> Given a string, 'abcdef', I'd like to tokenize it based on the above
> table. My tokenizer is greedy and always looks for longest prefix match.
> Which means,
>
> Eg1: tokenize('abcdef') = 'abcd' + tokenize(ef)
> Eg2: tokenize('xyzabdab') = tokenize(xyx) + 'abd' + 'ab'
>
> I have tried all these methods to find the longest prefix match.
>
> Method1 - select * from patterns where pattern match 'abcdef OR abcde OR
> abcd OR abc OR ab OR a' order by length(pattern) desc limit 1
>
> Method2 - select * from patterns_content where pattern = 'abcdef' OR
> pattern = 'abcde' OR pattern = 'abcd' OR pattern = 'abc' OR pattern = 'ab'
> OR pattern = 'a' order by length(pattern) desc limit 1
>
> Method3: Start reading from left and read one character at a time. Each
> run, execute select 1 from patterns_content as c where c.pattern >= 'a' and
> c.pattern <= 'a' || 'z' limit 1. Next run, it will test for 'ab', then
> 'abc' etc.
>
> All the above methods works well. Surprisingly, the last method
> outperforms all other methods. I am wondering what optimization is making
> the last query always execute fast? Is it reliable to assume that the query
> will have it performance even with huge number of rows?
>
> I am also wondering if there are other ways to solve this problem in an
> efficient way? Any help would be great. My table will have more than a
> million of data.
>
> --
> -Navaneeth
>



-- 
-n
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How is this query so fast?

2012-08-02 Thread Navaneeth.K.N
Hello,

I have a table which contains a set of words. These are stored in a table
called 'patterns_content' with pattern as unique key. There is a FTS4 table
named 'patterns' which has 'content=patterns_content' set. It will have
only ASCII characters. Something like,

ab
abd
abcd
.
.

Given a string, 'abcdef', I'd like to tokenize it based on the above table.
My tokenizer is greedy and always looks for longest prefix match. Which
means,

Eg1: tokenize('abcdef') = 'abcd' + tokenize(ef)
Eg2: tokenize('xyzabdab') = tokenize(xyx) + 'abd' + 'ab'

I have tried all these methods to find the longest prefix match.

Method1 - select * from patterns where pattern match 'abcdef OR abcde OR
abcd OR abc OR ab OR a' order by length(pattern) desc limit 1

Method2 - select * from patterns_content where pattern = 'abcdef' OR
pattern = 'abcde' OR pattern = 'abcd' OR pattern = 'abc' OR pattern = 'ab'
OR pattern = 'a' order by length(pattern) desc limit 1

Method3: Start reading from left and read one character at a time. Each
run, execute select 1 from patterns_content as c where c.pattern >= 'a' and
c.pattern <= 'a' || 'z' limit 1. Next run, it will test for 'ab', then
'abc' etc.

All the above methods works well. Surprisingly, the last method outperforms
all other methods. I am wondering what optimization is making the last
query always execute fast? Is it reliable to assume that the query will
have it performance even with huge number of rows?

I am also wondering if there are other ways to solve this problem in an
efficient way? Any help would be great. My table will have more than a
million of data.

-- 
-Navaneeth
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users