Re: [sqlite] Bigger table and query optimization

2008-09-10 Thread Bruno Moreira Guedes
2008/9/9 Igor Tandetnik <[EMAIL PROTECTED]>:
> "Stephen Oberholtzer"
> <[EMAIL PROTECTED]> wrote in
> message
> news:[EMAIL PROTECTED]
>> Idea: Submit a patch that allows LIKE expressions that start with a
>> fixed
>> string (i.e. don't start with '%') to use the index to improve
>> performance. (SQL Server 2000 does this.)
>
> http://www.sqlite.org/optoverview.html
> 4.0 The LIKE optimization
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Stephen,

Good idea(idea2). In the next week I'll try to code some patch
including reverse() and so other functions I feel SQLite could have.
:-) And I'll try to submit it too.

About the LIKE indexes(first idea), reading Igor's link, I saw it's
already supported(with some restrictions).

Igor,

This docs is too useful. I take a read on it, and I feel it's very
interesting for me.

Thank you all people!!!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-10 Thread Stephen Oberholtzer
On Tue, Sep 9, 2008 at 10:18 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote:

> "Stephen Oberholtzer"
> <[EMAIL PROTECTED]> wrote in
> message
> news:[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> > Idea: Submit a patch that allows LIKE expressions that start with a
> > fixed
> > string (i.e. don't start with '%') to use the index to improve
> > performance. (SQL Server 2000 does this.)
>
> http://www.sqlite.org/optoverview.html
> 4.0 The LIKE optimization
>
> Igor Tandetnik
>
>
"For the LIKE operator, if case_sensitive_like mode is enabled then the
column must use the default BINARY collating sequence, or if
case_sensitive_like mode is disabled then the column must use the built-in
NOCASE collating sequence."

There's a flaw in this design -- which explains why, when I actually tested
it, the index wasn't used:

CREATE TABLE foo (id integer primary key, name text, name_back text);
INSERT INTO "foo" VALUES(1,'one','eno');
CREATE INDEX name_back_IX2 on foo(name_back collate nocase);

pragma case_sensitive_like=off;
explain query plan select name_back from foo where name_back like 'e%';
order   fromdetail
--  --  --
0   0   TABLE foo


The index "name_back_IX2" is collated NOCASE, which is the same collation
that LIKE uses -- but the index doesn't get used!

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Igor Tandetnik
"Stephen Oberholtzer"
<[EMAIL PROTECTED]> wrote in
message
news:[EMAIL PROTECTED]
> Idea: Submit a patch that allows LIKE expressions that start with a
> fixed
> string (i.e. don't start with '%') to use the index to improve
> performance. (SQL Server 2000 does this.)

http://www.sqlite.org/optoverview.html
4.0 The LIKE optimization

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Stephen Oberholtzer
On Tue, Sep 9, 2008 at 9:04 AM, Bruno Moreira Guedes <[EMAIL PROTECTED]>wrote:

> 2008/9/9 P Kishor <[EMAIL PROTECTED]>:
> > On 9/9/08, Bruno Moreira Guedes <[EMAIL PROTECTED]> wrote:
> >> Hi people!!
> >>
> >>  I'm getting problems with a bigger table. I'm doing a query like this:
> >>
> >>  SELECT field FROM tables WHERE '%.' || somefield LIKE 'somestring'
> >
> > interesting variation on the syntax. Usually one would
> >
> > WHERE somefield LIKE '%.somestring'
> >
> >
>
> Yes... I need to get the rows where 'something' ENDS wifh
> '.[field-value]'. But I really think this solution isn't very well.


Store the field *twice* -- once normally, and once *backwards*.
Put an index on the backwards column, and when searching, do:
somefield_backwards >= 'gnitset' and somefield_backwards < 'gnitseu'
(Note that I advanced the last character from 't' to 'u' and used < instead
of <=)

This will enable SQLite to use the index on the backwards column to
efficiently find everything.

Idea: Submit a patch that allows LIKE expressions that start with a fixed
string (i.e. don't start with '%') to use the index to improve performance.
(SQL Server 2000 does this.)

Idea 2: Submit a patch adding a built-in 'reverse' function to assist in
this.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Bruno Moreira Guedes
2008/9/9 Igor Tandetnik <[EMAIL PROTECTED]>:
> Bruno Moreira Guedes
> <[EMAIL PROTECTED]> wrote:
>> I can split my 'something' in st0, st1, st2, stN... I have a '.'
>> betwen this values. But the number of "st"s is undefined. I tried to
>> do "the necessary" number comparsions, appending tokens:
>>
>> SELECT fields FROM sometable WHERE field = 'st0.st1.st2.st3' OR field
>> = 'st1.st2.st3' OR field = 'st2.st3' OR field = 'st3';
>>
>> Unfortunately, even creating a INDEX, the performance was decreased
>> instead of being increased.
>
> Try formulating it this way:
>
> SELECT fields FROM sometable WHERE field IN (
>'st0.st1.st2.st3', 'st1.st2.st3', 'st2.st3', 'st3');
>
>
> As the number of pieces grows, it might be better to create and fill a
> temporary table with suffixes, then do
>
> SELECT fields FROM sometable WHERE field IN
>(select suffix from suffixes);
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Hi all!!

Dennis,

Thank you by your idea. I've been made some testing, and it seems to
be the better solution. Unfortunately, my table has three
fields(including the special rowid field), two integers and a varchar.
The amount of data(about 500.000 rows) recently added increased 32Mb
on database's file. So, I think maintaining a 'reverse copy' of
varchar field will increase too many the database's file size -
considering it'll grow about 10x in a few days.

Igor,

Thank you also. Using the 'IN' really solved the problem. The query
runs fast. It'll require some 'manual string processing', because my
application will have to generate the "list of something's tails", but
as there's no big number of tails, it isn't a problem, and solve the
needs by now.

Thank you all people. It's hard to develop database applications with
little database knowledge. Without you I couldn't make it works.

Bruno M Guedes
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Igor Tandetnik
Bruno Moreira Guedes
<[EMAIL PROTECTED]> wrote:
> I can split my 'something' in st0, st1, st2, stN... I have a '.'
> betwen this values. But the number of "st"s is undefined. I tried to
> do "the necessary" number comparsions, appending tokens:
>
> SELECT fields FROM sometable WHERE field = 'st0.st1.st2.st3' OR field
> = 'st1.st2.st3' OR field = 'st2.st3' OR field = 'st3';
>
> Unfortunately, even creating a INDEX, the performance was decreased
> instead of being increased.

Try formulating it this way:

SELECT fields FROM sometable WHERE field IN (
'st0.st1.st2.st3', 'st1.st2.st3', 'st2.st3', 'st3');


As the number of pieces grows, it might be better to create and fill a 
temporary table with suffixes, then do

SELECT fields FROM sometable WHERE field IN
(select suffix from suffixes);

Igor Tandetnik



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Bruno Moreira Guedes
2008/9/9 Dennis Cote <[EMAIL PROTECTED]>:
> Bruno Moreira Guedes wrote:
>>
>> I can split my 'something' in st0, st1, st2, stN... I have a '.'
>> betwen this values. But the number of "st"s is undefined. I tried to
>> do "the necessary" number comparsions, appending tokens:
>>
>> SELECT fields FROM sometable WHERE field = 'st0.st1.st2.st3' OR field
>> = 'st1.st2.st3' OR field = 'st2.st3' OR field = 'st3';
>>
>
> You could try this
>
>   select fields from sometable
>   where substr(field, -length(:somestring)) = :somestring;
>

I need to check if:
   '.' || field
is a substring of something, and if something

But it should be useful. Thank you!

> This will still require a full table scan and will not use an index, but
> the overhead of testing if the field ends with the appropriate string
> should be as small as possible.
>
> Your main problem is there is no way to use an index to match the end of
> a string. If this is a common operation for you database, you may want
> to add a field that stores the strings in reverse order. You can then
> add an index on that string. What used to be the end of the string is
> now the beginning of the reversed field, and can be searched quickly
> using an index.
>
> You will need to create a user defined function to reverse the
> characters of a string.
>
>   reverse('string') => 'gnirts'
>
> With this function you could add a new field to your database and create
> a suitable index on that field.
>
>   alter table sometable add column reversed text;
>   update sometable set reversed = reverse(field);
>   create index field_reversed_idx on sometable(reversed);
>
> Now you can use the same function to reverse the string you are trying
> to match and use a like comparison to locate the strings quickly using
> the index (since the search string is now the prefix of string).
>
>   select field from sometable
>   where reversed like reverse(:somestring) || '%';
>
> HTH
> Dennis Cote
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Yes, it's the perfect solution!! I'll work to implement this. Thank you again!!

Thank you all,
Bruno
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Dennis Cote
Bruno Moreira Guedes wrote:
> 
> I can split my 'something' in st0, st1, st2, stN... I have a '.'
> betwen this values. But the number of "st"s is undefined. I tried to
> do "the necessary" number comparsions, appending tokens:
> 
> SELECT fields FROM sometable WHERE field = 'st0.st1.st2.st3' OR field
> = 'st1.st2.st3' OR field = 'st2.st3' OR field = 'st3';
> 

You could try this

   select fields from sometable
   where substr(field, -length(:somestring)) = :somestring;

This will still require a full table scan and will not use an index, but 
the overhead of testing if the field ends with the appropriate string 
should be as small as possible.

Your main problem is there is no way to use an index to match the end of 
a string. If this is a common operation for you database, you may want 
to add a field that stores the strings in reverse order. You can then 
add an index on that string. What used to be the end of the string is 
now the beginning of the reversed field, and can be searched quickly 
using an index.

You will need to create a user defined function to reverse the 
characters of a string.

   reverse('string') => 'gnirts'

With this function you could add a new field to your database and create 
a suitable index on that field.

   alter table sometable add column reversed text;
   update sometable set reversed = reverse(field);
   create index field_reversed_idx on sometable(reversed);

Now you can use the same function to reverse the string you are trying 
to match and use a like comparison to locate the strings quickly using 
the index (since the search string is now the prefix of string).

   select field from sometable
   where reversed like reverse(:somestring) || '%';

HTH
Dennis Cote



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Bruno Moreira Guedes
2008/9/9 Igor Tandetnik <[EMAIL PROTECTED]>:
> "Bruno Moreira Guedes"
> <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>> Yes... I need to get the rows where 'something' ENDS wifh
>> '.[field-value]'. But I really think this solution isn't very well.
>
> Can field-value contain a period? If not, you can just extract the
> portion of 'something' after the last period, then run a straight
> comparison: WHERE somefield='tail-of-somestring' . Such a comparison can
> then be sped up with an index.
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Igor,

I can split my 'something' in st0, st1, st2, stN... I have a '.'
betwen this values. But the number of "st"s is undefined. I tried to
do "the necessary" number comparsions, appending tokens:

SELECT fields FROM sometable WHERE field = 'st0.st1.st2.st3' OR field
= 'st1.st2.st3' OR field = 'st2.st3' OR field = 'st3';

Unfortunately, even creating a INDEX, the performance was decreased
instead of being increased.

Now I'm thinking about using length() to filter where the strings will
be compared... After some testing, I increased a little of the
performance, but it still very bad!!

I still looking for hints!!

Thank you
Bruno M Guedes
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Igor Tandetnik
"Bruno Moreira Guedes"
<[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Yes... I need to get the rows where 'something' ENDS wifh
> '.[field-value]'. But I really think this solution isn't very well.

Can field-value contain a period? If not, you can just extract the 
portion of 'something' after the last period, then run a straight 
comparison: WHERE somefield='tail-of-somestring' . Such a comparison can 
then be sped up with an index.

Igor Tandetnik



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread Bruno Moreira Guedes
2008/9/9 P Kishor <[EMAIL PROTECTED]>:
> On 9/9/08, Bruno Moreira Guedes <[EMAIL PROTECTED]> wrote:
>> Hi people!!
>>
>>  I'm getting problems with a bigger table. I'm doing a query like this:
>>
>>  SELECT field FROM tables WHERE '%.' || somefield LIKE 'somestring'
>
> interesting variation on the syntax. Usually one would
>
> WHERE somefield LIKE '%.somestring'
>
>

Yes... I need to get the rows where 'something' ENDS wifh
'.[field-value]'. But I really think this solution isn't very well.

>>
>>  The table has about 500.000 registers. When the table is short, it's
>>  too fast, but when it's populated it gets slow. I've been imagined it
>>  before. What may I do to optimize this query?? Should an INDEX help
>>  me, if it's comparing concatenated strings??
>
> LIKE doesn't use INDEXes, so you are really out of luck with big
> tables. Try some other approach. On big tables it will be slow,
> although 500K "registers" (is a "register"  the same as "rows"?) is
> not much at all.

Yes, with "registers" I'm talking about "rows". 500K rows increased
about 34Mb on database's file size. And I fear it's just 'a little',
because I'll probably import more 'bigger amounts' of data(each
'amount' with about 500K rows).
>
> I would try FTS.
>
>
>
>>
>>  Thank you all,
>>  Bruno
>>  ___
>>  sqlite-users mailing list
>>  sqlite-users@sqlite.org
>>  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
> --
> Puneet Kishor http://punkish.eidesis.org/
> Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
> Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Thank you by the help, I'll read FTS docs right now!!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bigger table and query optimization

2008-09-09 Thread P Kishor
On 9/9/08, Bruno Moreira Guedes <[EMAIL PROTECTED]> wrote:
> Hi people!!
>
>  I'm getting problems with a bigger table. I'm doing a query like this:
>
>  SELECT field FROM tables WHERE '%.' || somefield LIKE 'somestring'

interesting variation on the syntax. Usually one would

WHERE somefield LIKE '%.somestring'


>
>  The table has about 500.000 registers. When the table is short, it's
>  too fast, but when it's populated it gets slow. I've been imagined it
>  before. What may I do to optimize this query?? Should an INDEX help
>  me, if it's comparing concatenated strings??

LIKE doesn't use INDEXes, so you are really out of luck with big
tables. Try some other approach. On big tables it will be slow,
although 500K "registers" (is a "register"  the same as "rows"?) is
not much at all.

I would try FTS.



>
>  Thank you all,
>  Bruno
>  ___
>  sqlite-users mailing list
>  sqlite-users@sqlite.org
>  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Puneet Kishor http://punkish.eidesis.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Bigger table and query optimization

2008-09-09 Thread Bruno Moreira Guedes
Hi people!!

I'm getting problems with a bigger table. I'm doing a query like this:

SELECT field FROM tables WHERE '%.' || somefield LIKE 'somestring'

The table has about 500.000 registers. When the table is short, it's
too fast, but when it's populated it gets slow. I've been imagined it
before. What may I do to optimize this query?? Should an INDEX help
me, if it's comparing concatenated strings??

Thank you all,
Bruno
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users