Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 21:39:

My fault...I thought I had extracted it under another name...turns out I was 
using a different db...duh...



Looks like it uses the indexes just fine.  I'm using the Window's EXE from the 
website.  Also got the same result on Unix.

Did you compile your own?  Or did you check to see that the indexes still 
weren't being used on your subset?

I compiled the amalgamation:

cc -o sqlite3 -O2 shell.c sqlite3.c -llpthread -ldl



Sorry, forgot to answer that.

The Arch build source is http://www.sqlite.org/sqlite-autoconf-3071100.tar.gz

The build runs ./configure --prefix=/usr --disable-static

Then make with CFLAGS='CFLAGS -DSQLITE_ENABLE_FTS3=1 
-DSQLITE_ENABLE_COLUMN_METADATA=1 -DSQLITE_ENABLE_UNLOCK_NOTIFY 
-DSQLITE_SECURE_DELETE'


Pete


--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 21:39:

My fault...I thought I had extracted it under another name...turns out I was 
using a different db...duh...




Well we're none of us perfect. Only the female of the species can do more than 
two things at once...



Yes, that query is fine.

But add an 'order by transfer_date' clause and the planner no longer uses 
indexes but uses scans instead.


Then if you do a select * ... it goes back to using indexes.

With 100K+ records in the full database the difference in execution times is not 
insignificant.


Pete
--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Black, Michael (IS)
My fault...I thought I had extracted it under another name...turns out I was 
using a different db...duh...



Looks like it uses the indexes just fine.  I'm using the Window's EXE from the 
website.  Also got the same result on Unix.

Did you compile your own?  Or did you check to see that the indexes still 
weren't being used on your subset?

I compiled the amalgamation:

cc -o sqlite3 -O2 shell.c sqlite3.c -llpthread -ldl

Running on Redhat 5.



sqlite> SELECT transfer_date FROM transfer_history WHERE regn_no = '023674' and 
transfer_date<= '2012-05-01' order by transfer_date asc;
1995-04-04 00:00:00
1999-04-01 12:00:00
2002-03-31 12:00:00

sqlite> explain query plan SELECT transfer_date FROM transfer_history WHERE 
regn_no = '023674' and transfer_date<= '2012-05-01' order by transfer_date asc;
3|0|1|SCAN TABLE flock AS f (~161 rows)
3|1|0|SEARCH TABLE transfer AS tr USING INDEX tr_flock_no_index (flock_no=?) 
(~5 rows)
4|0|1|SCAN TABLE flock AS f (~161 rows)
4|1|0|SEARCH TABLE sheep AS s USING INDEX sheep_org_flock_index 
(originating_flock=?) (~5 rows)
2|0|0|COMPOUND SUBQUERIES 3 AND 4 (UNION ALL)
5|0|1|SCAN TABLE flock AS f (~161 rows)
5|1|0|SEARCH TABLE sheep AS s USING INDEX sheep_reg_flock_index 
(registering_flock=?) (~2 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 5 (UNION ALL)
0|0|0|SCAN SUBQUERY 1 (~67 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Peter [pe...@somborneshetlands.co.uk]
Sent: Thursday, April 26, 2012 3:14 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Re Query planner creating a slow plan

Black, Michael (IS) wrote, On 26/04/12 19:00:
> Sqliteman must be pointing to the wrong database.
>
>
>
> sqlite>  SELECT transfer_date FROM transfer_history WHERE regn_no =
> '039540'. Error: no such table: transfer_history
>

Hmm. I've just done the following - cut & paste from my terminal:

[home@system06 test]$ mkdir sqlite
[home@system06 test]$ cd sqlite
[home@system06 sqlite]$ wget
http://www.somborneshetlands.co.uk/things/sss-test-nomem.zip
--2012-04-26 21:09:04--
http://www.somborneshetlands.co.uk/things/sss-test-nomem.zip
Resolving www.somborneshetlands.co.uk<http://www.somborneshetlands.co.uk/>... 
91.197.33.236
Connecting to 
www.somborneshetlands.co.uk|91.197.33.236|:80<http://www.somborneshetlands.co.uk%7c91.197.33.236%7c/>...
 connected.
HTTP request sent, awaiting response... 200 OK
Length: 184279 (180K) [application/zip]
Saving to: `sss-test-nomem.zip'

100%[==>]
184,279  792K/s   in 0.2s

2012-04-26 21:09:04 (792 KB/s) - `sss-test-nomem.zip' saved [184279/184279]

[home@system06 sqlite]$ unzip sss-test-nomem.zip
Archive:  sss-test-nomem.zip
   inflating: sss-test-nomem.sqlite
[home@system06 sqlite]$ sqlite3 sss-test-nomem.sqlite
SQLite version 3.7.11 2012-03-20 11:35:50
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from transfer_history where regn_no = '023674'
...> ;
023674|1610|Wycoller|1999-04-01 12:00:00|April 99|Presumed
023674|SSB900|(Dead)|2002-03-31 12:00:00|31/03/2002|Presumed
023674|1004|Glynwood|1995-04-04 00:00:00|4/4/95|Birth
sqlite>


Seems to work for me. You'll have to use 023674 instead of 039540 as the
latter doesn't exist in this test database.

Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 19:00:

Sqliteman must be pointing to the wrong database.



sqlite>  SELECT transfer_date FROM transfer_history WHERE regn_no =
'039540'. Error: no such table: transfer_history



Hmm. I've just done the following - cut & paste from my terminal:

[home@system06 test]$ mkdir sqlite
[home@system06 test]$ cd sqlite
[home@system06 sqlite]$ wget 
http://www.somborneshetlands.co.uk/things/sss-test-nomem.zip
--2012-04-26 21:09:04-- 
http://www.somborneshetlands.co.uk/things/sss-test-nomem.zip

Resolving www.somborneshetlands.co.uk... 91.197.33.236
Connecting to www.somborneshetlands.co.uk|91.197.33.236|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 184279 (180K) [application/zip]
Saving to: `sss-test-nomem.zip'

100%[==>] 
184,279  792K/s   in 0.2s


2012-04-26 21:09:04 (792 KB/s) - `sss-test-nomem.zip' saved [184279/184279]

[home@system06 sqlite]$ unzip sss-test-nomem.zip
Archive:  sss-test-nomem.zip
  inflating: sss-test-nomem.sqlite
[home@system06 sqlite]$ sqlite3 sss-test-nomem.sqlite
SQLite version 3.7.11 2012-03-20 11:35:50
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from transfer_history where regn_no = '023674'
   ...> ;
023674|1610|Wycoller|1999-04-01 12:00:00|April 99|Presumed
023674|SSB900|(Dead)|2002-03-31 12:00:00|31/03/2002|Presumed
023674|1004|Glynwood|1995-04-04 00:00:00|4/4/95|Birth
sqlite>


Seems to work for me. You'll have to use 023674 instead of 039540 as the 
latter doesn't exist in this test database.


Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Black, Michael (IS)
Sqliteman must be pointing to the wrong database.



sqlite> SELECT transfer_date FROM transfer_history WHERE regn_no = '039540'.
Error: no such table: transfer_history





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Peter [pe...@somborneshetlands.co.uk]
Sent: Thursday, April 26, 2012 12:44 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Re Query planner creating a slow plan

Black, Michael (IS) wrote, On 26/04/12 18:21:
> There is no transfer_table_new in that database.  Or any view named 
> "transfer" anything.
>
> So what query are you running on this one?
>
>
Sorry, that should be transfer_history. The name seems to have got
corrupted during our email exchanges.

I've just downloaded and extracted the database and it has all 44 tables
and 18 views - well Sqliteman thinks it has anyway..

Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 18:21:

There is no transfer_table_new in that database.  Or any view named "transfer" 
anything.

So what query are you running on this one?


Sorry, that should be transfer_history. The name seems to have got 
corrupted during our email exchanges.


I've just downloaded and extracted the database and it has all 44 tables 
and 18 views - well Sqliteman thinks it has anyway..


Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Black, Michael (IS)
There is no transfer_table_new in that database.  Or any view named "transfer" 
anything.

So what query are you running on this one?



Here's all the VIEWs.



CREATE VIEW current_flock_owner AS SELECT latest.flock_no, 
latest.owner_person_id, latest.ow
 FROM flock_owner even_later WHERE  latest.flock_no = even_later.flock_no AND 
latest.owner_c
CREATE VIEW original_flock_owner AS SELECT first.flock_no, 
first.owner_person_id, first.owne
OM flock_owner even_earlier WHERE  first.flock_no = even_earlier.flock_no AND 
first.owner_ch
CREATE VIEW three_gen_ped AS SELECT s.regn_no, s.sire_no, s.dam_no, g1.sire_no 
AS gs1, g1.da
1, gg1.dam_no AS ggd1, gg2.sire_no AS ggs2, gg2.dam_no AS ggd2, gg3.sire_no AS 
ggs3, gg3.dam
IN sheep g1 ON s.sire_no = g1.regn_no JOIN sheep g2 ON s.dam_no = g2.regn_no 
JOIN sheep gg1
o JOIN sheep gg3 ON g2.sire_no = gg3.regn_no JOIN sheep gg4 ON g2.dam_no = 
gg4.regn_no;
CREATE VIEW two_gen_ped AS SELECT s.regn_no, s.sire_no, s.dam_no, g1.sire_no AS 
gs1, g1.dam_
ep g1 ON s.sire_no = g1.regn_no JOIN sheep g2 ON s.dam_no = g2.regn_no;
CREATE VIEW current_ear_tag as
CREATE VIEW original_ear_tag as
CREATE VIEW current_eid as
CREATE VIEW current_inspection as
CREATE VIEW current_prp as
CREATE VIEW sheep_data as
CREATE VIEW sheep_basic_data as
CREATE VIEW sheep_progeny_data as





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Peter [pe...@somborneshetlands.co.uk]
Sent: Thursday, April 26, 2012 11:54 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Re Query planner creating a slow plan

Black, Michael (IS) wrote, On 26/04/12 17:05:
> What version are you using?  Can you extract an example of all your
> tables/indexes/data to demonstrate?

I'm on 3.7.11 (Arch-Linux).

You can grab a database at
http://www.somborneshetlands.co.uk/things/sss-test-nomem.zip

You will need to use a regn_no of 023674 with that as its a cut down
data set for testing (the live one is 35MB). The date can remain the same.


>
> One more idea.  Explicitly ask for the indexes...and a compound index
> might be nice to have around.
>
> I dont' think you can ask for more than one index without doing
> subselects, can you?

AFAIK you can't create indexes on a view, or can you?


Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 17:05:

What version are you using?  Can you extract an example of all your
tables/indexes/data to demonstrate?


I'm on 3.7.11 (Arch-Linux).

You can grab a database at 
http://www.somborneshetlands.co.uk/things/sss-test-nomem.zip


You will need to use a regn_no of 023674 with that as its a cut down 
data set for testing (the live one is 35MB). The date can remain the same.





One more idea.  Explicitly ask for the indexes...and a compound index
might be nice to have around.

I dont' think you can ask for more than one index without doing
subselects, can you?


AFAIK you can't create indexes on a view, or can you?


Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Black, Michael (IS)
What version are you using?  Can you extract an example of all your 
tables/indexes/data to demonstrate?



One more idea.  Explicitly ask for the indexes...and a compound index might be 
nice to have around.

I dont' think you can ask for more than one index without doing subselects, can 
you?



SELECT transfer_date FROM transfer_history_new INDEXED BY myindex
WHERE regn_no = '039540' and transfer_date<= '2012-05-01'
order by transfer_date asc





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Peter [pe...@somborneshetlands.co.uk]
Sent: Thursday, April 26, 2012 10:46 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Re Query planner creating a slow plan

Black, Michael (IS) wrote, On 26/04/12 16:38:
> And does this also work?  Sounds like the planner isn't seeing all the 
> columns in the view unless in the select.
>
>
>
> SELECT transfer_date,regn_no FROM transfer_history_new
> WHERE regn_no = '039540' and transfer_date<= '2012-05-01'
> order by transfer_date asc
>

No, that uses scans as well.

I'm tempted to suggest this might be a bug since the pattern seems
illogical.

Pete


--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Simon Slavin wrote, On 26/04/12 16:39:

On 26 Apr 2012, at 4:24pm, Peter  wrote:


>  There are indexes on all the fields used in the tables of the 
transfer_history view.
>
>  While tinkering I have discovered something:
>
>  If instead of
>  SELECT transfer_date FROM transfer_history_new
>  WHERE regn_no = '039540' and transfer_date<= '2012-05-01'
>  order by transfer_date asc
>
>  I write
>
>  SELECT * FROM transfer_history_new
>  WHERE regn_no = '039540' and transfer_date<= '2012-05-01'
>  order by transfer_date asc
>
>  then I get an execution time of a couple of milliseconds instead of 300ms or 
so (times from Sqliteman this time). The planner has reverted to using indexes 
instead of scans...

Comparing the output of EXPLAIN QUERY PLAN for those two, can you see output 
that supports that ?  If so, it does seem to be a bug of some kind.

Simon.


See my original post. Explain does indeed show that the difference is 
the use of indexes vs scans.


Pete
--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 16:38:

And does this also work?  Sounds like the planner isn't seeing all the columns 
in the view unless in the select.



SELECT transfer_date,regn_no FROM transfer_history_new
WHERE regn_no = '039540' and transfer_date<= '2012-05-01'
order by transfer_date asc



No, that uses scans as well.

I'm tempted to suggest this might be a bug since the pattern seems 
illogical.


Pete


--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Marc L. Allen wrote, On 26/04/12 16:38:

Out of curiosity, try...

SELECT transfer_date, regn_no FROM transfer_history_new WHERE regn_no
= '039540' and transfer_date<= '2012-05-01' order by transfer_date
asc

Is the problem that combining the order by with having transfer_date
as the only returned item make it use the transfer_date index instead
of the preferable regn_no index?


No, that makes no difference either.

Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Simon Slavin

On 26 Apr 2012, at 4:24pm, Peter  wrote:

> There are indexes on all the fields used in the tables of the 
> transfer_history view.
> 
> While tinkering I have discovered something:
> 
> If instead of
> SELECT transfer_date FROM transfer_history_new
> WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
> order by transfer_date asc
> 
> I write
> 
> SELECT * FROM transfer_history_new
> WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
> order by transfer_date asc
> 
> then I get an execution time of a couple of milliseconds instead of 300ms or 
> so (times from Sqliteman this time). The planner has reverted to using 
> indexes instead of scans...

Comparing the output of EXPLAIN QUERY PLAN for those two, can you see output 
that supports that ?  If so, it does seem to be a bug of some kind.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Black, Michael (IS)
And does this also work?  Sounds like the planner isn't seeing all the columns 
in the view unless in the select.



SELECT transfer_date,regn_no FROM transfer_history_new
WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
order by transfer_date asc



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Marc L. Allen
Out of curiosity, try...

SELECT transfer_date, regn_no FROM transfer_history_new WHERE regn_no = 
'039540' and transfer_date <= '2012-05-01'
order by transfer_date asc

Is the problem that combining the order by with having transfer_date as the 
only returned item make it use the transfer_date index instead of the 
preferable regn_no index?

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Peter
> Sent: Thursday, April 26, 2012 11:24 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Re Query planner creating a slow plan
> 
> Marc L. Allen wrote, On 26/04/12 15:57:
> > What indexes are on the underlying tables?
> >
> 
> There are indexes on all the fields used in the tables of the
> transfer_history view.
> 
> While tinkering I have discovered something:
> 
> If instead of
> SELECT transfer_date FROM transfer_history_new WHERE regn_no = '039540'
> and transfer_date <= '2012-05-01'
> order by transfer_date asc
> 
> I write
> 
> SELECT * FROM transfer_history_new
> WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
> order by transfer_date asc
> 
> then I get an execution time of a couple of milliseconds instead of
> 300ms or so (times from Sqliteman this time). The planner has reverted
> to using indexes instead of scans...
> 
> --
> 
> Peter Hardman
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Marc L. Allen wrote, On 26/04/12 15:57:

What indexes are on the underlying tables?



There are indexes on all the fields used in the tables of the 
transfer_history view.


While tinkering I have discovered something:

If instead of
SELECT transfer_date FROM transfer_history_new
WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
order by transfer_date asc

I write

SELECT * FROM transfer_history_new
WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
order by transfer_date asc

then I get an execution time of a couple of milliseconds instead of 
300ms or so (times from Sqliteman this time). The planner has reverted 
to using indexes instead of scans...


--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Marc L. Allen
What indexes are on the underlying tables?

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Peter
> Sent: Thursday, April 26, 2012 10:55 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Re Query planner creating a slow plan
> 
> Jim Morris wrote, On 26/04/12 15:36:
> > It is possible using an alias would force better behavior:
> >
> > selsect theDate
> > from (select transfer_date as theDate from transfer_history where
> > regn_no='039540' and transfer_date<= '2012-05-01') order by theDate
> >
> 
> Once again, it makes no difference - the planner still picks the same
> plan using scans.
> 
> 
> --
> 
> Peter Hardman
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Jim Morris wrote, On 26/04/12 15:36:

It is possible using an alias would force better behavior:

selsect theDate
from (select transfer_date as theDate from transfer_history
where regn_no='039540' and transfer_date<= '2012-05-01') order by
theDate



Once again, it makes no difference - the planner still picks the same 
plan using scans.



--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Marc L. Allen wrote, On 26/04/12 15:30:

I suppose using a temporary table is out of the question?


It's much simpler (and probably quicker though I didn't test it) to read 
the three rows, sort them by date, pick the first row and use that. No 
need to generate random table names, create and drop the table.


Python time.time() gives 0.001sec for all that (to 3 dec places) which 
is fast enough for me ;) (and faster than PostgreSQL at 0.023 using the 
plain query).


 But, then

again, that only solves the specific issue.  I guess the more general
question is how views with unions interact with aggregates and order
by.

What happens if you don't use the view, but perform the query using
the actual tables?


Makes no difference - the planner is still using the scans instead of 
indexes.


--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Jim Morris

It is possible using an alias would force better behavior:

selsect theDate
 from (select transfer_date as theDate from transfer_history
where regn_no='039540' and transfer_date<= '2012-05-01') order by
theDate

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Marc L. Allen
I suppose using a temporary table is out of the question?  But, then again, 
that only solves the specific issue.  I guess the more general question is how 
views with unions interact with aggregates and order by.

What happens if you don't use the view, but perform the query using the actual 
tables?

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Black, Michael (IS)
> Sent: Thursday, April 26, 2012 10:11 AM
> To: General Discussion of SQLite Database
> Subject: [sqlite] Re Query planner creating a slow plan
> 
> What happens if you use a subselect?
> 
> 
> 
> selsect transfer_date from (select transfer_date from transfer_history
> where regn_no='039540' and transfer_date <= '2012-05-01') order by
> transfer_date;
> 
> 
> 
> 
> 
> Michael D. Black
> 
> Senior Scientist
> 
> Advanced Analytics Directorate
> 
> Advanced GEOINT Solutions Operating Unit
> 
> Northrop Grumman Information Systems
> 
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org]
> on behalf of Peter [pe...@somborneshetlands.co.uk]
> Sent: Thursday, April 26, 2012 7:00 AM
> To: sqlite-users@sqlite.org
> Subject: EXT :[sqlite] Query planner creating a slow plan
> 
> Hi,
> 
> I have a view 'transfer_history' which aggregates records from 3 tables
> using UNION ALL. the aggregate is about 102k records.
> 
> I have a query:
> 
> SELECT transfer_date from transfer_history
>   WHERE regn_no = '039540' and transfer_date <= '2012-05-01'
> 
> This returns three records and takes a couple of milliseconds - good.
> 
> But if I add an 'order by' clause, or an aggregate (max) on
> transfer_date, the time goes up to > 300ms. The reason seems to be that
> the query planner uses scans for all three sub-queries instead of using
> indexes on the underlying tables.
> 
> With the basic query yhe QP says;
> 
> SEARCH TABLE transfer AS tr USING INDEX sqlite_autoindex_transfer_1
> (regn_no=? AND transfer_date INDEX sqlite_autoindex_sheep_1 (regn_no=?)
> (~1 rows)
> COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
> SEARCH TABLE sheep AS s USING INDEX sqlite_autoindex_sheep_1
> (regn_no=?)
> (~1 rows)
> COMPOUND SUBQUERIES 1 AND 4 (UNION ALL)
> 
> With the 'order by' clause the QP says:
> 
> SCAN TABLE transfer AS tr (~49043 rows)
> SCAN TABLE sheep AS s (~51858 rows)
> COMPOUND SUBQUERIES 3 AND 4 (UNION ALL)
> SCAN TABLE sheep AS s (~25929 rows)
> COMPOUND SUBQUERIES 2 AND 5 (UNION ALL)
> SCAN SUBQUERY 1 AS t2 (~4227 rows)
> USE TEMP B-TREE FOR ORDER BY
> 
> Seems to me it ought to be able to just sort the result of the first
> plan. ATM it's an order of magnitude quicker at least to do the sort in
> Python in the application.
> 
> Pete
> 
> --
> 
> Peter Hardman
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Peter

Black, Michael (IS) wrote, On 26/04/12 15:11:

select transfer_date from (select transfer_date from
transfer_history where regn_no='039540' and transfer_date<=
'2012-05-01') order by transfer_date;



Makes no difference.

Pete

--

Peter Hardman
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Re Query planner creating a slow plan

2012-04-26 Thread Black, Michael (IS)
What happens if you use a subselect?



selsect transfer_date from (select transfer_date from transfer_history where 
regn_no='039540' and transfer_date <= '2012-05-01') order by transfer_date;





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Peter [pe...@somborneshetlands.co.uk]
Sent: Thursday, April 26, 2012 7:00 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Query planner creating a slow plan

Hi,

I have a view 'transfer_history' which aggregates records from 3 tables
using UNION ALL. the aggregate is about 102k records.

I have a query:

SELECT transfer_date from transfer_history
  WHERE regn_no = '039540' and transfer_date <= '2012-05-01'

This returns three records and takes a couple of milliseconds - good.

But if I add an 'order by' clause, or an aggregate (max) on
transfer_date, the time goes up to > 300ms. The reason seems to be that
the query planner uses scans for all three sub-queries instead of using
indexes on the underlying tables.

With the basic query yhe QP says;

SEARCH TABLE transfer AS tr USING INDEX sqlite_autoindex_transfer_1
(regn_no=? AND transfer_datehttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Re: Query optimization

2007-10-31 Thread Igor Tandetnik

Matthew Gertner  wrote:

I'm running the following query:

SELECT DISTINCT _Resource.ResourceType, _Resource.Id,
_Resource.OwnerId, _Resource.Created, _Resource.Modified,
_Resource.Name, _Resource.ParentId, _Resource.Version,
_Resource.Description, _Resource.Creator, _File.Size, _File.MimeType,
_File.OriginURI, _File.OriginMimeType, _File.Hash,
_File.ContentRating, _File.OriginalExtension, _File.Thumbnail,
_File.Data, _File.TorrentInfoHash, _Resource_Metadata.New,
_File_Metadata.IsPublished, _File_Metadata.ReceiveStatus,
_File_Metadata.ShareMessageId, _Audio.Length, _Audio.BitRate,
_Image.Height, _Image.Width, _Image.ColorsUsed, _Text.Summary,
_Video.Height, _Video.Width, _Video.Length, _Video.FrameRate,
_Video.DataRate, _Video.SampleSize FROM _File LEFT OUTER JOIN
_Resource ON _File.Id=_Resource.Id LEFT OUTER JOIN _Resource_Metadata
ON _Resource_Metadata.ParentId=_Resource.Id LEFT OUTER JOIN
_File_Metadata ON _File_Metadata.Id=_Resource_Metadata.Id LEFT OUTER
JOIN _Audio ON _File.Id=_Audio.Id LEFT OUTER JOIN _Image ON
_File.Id=_Image.Id LEFT OUTER JOIN _Text ON _File.Id=_Text.Id LEFT
OUTER JOIN _Video ON _File.Id=_Video.Id LEFT OUTER JOIN _Source ON
_Source.ParentId=_Resource_Metadata.Id LEFT OUTER JOIN
_Source_PeerCommonName ON
_Source_PeerCommonName.ResourceId=_Source.Id WHERE
_Source_PeerCommonName.Value=? AND _File_Metadata.Id IN (SELECT Id
FROM _File_Metadata WHERE ReceiveStatus=?)

EXPLAIN QUERY PLAN tells me that a full scan is being performed on
the _File table, which is the first in the FROM clause. When I
reorder the joins so that _File_Metadata is first in the FROM clause,
then its primary key index (on Id) is used, as expected (since Id is
used in the WHERE clause). I would have expected SQLite's query
optimizer to reorder the joins automatically to use an index rather
than a full table scan.


Reordering LEFT JOIN changes the meaning of the statement. You don't 
want your DBMS to do that to you behind your back. Make sure you know 
what you are doing, and that the reordered statement still does what 
it's supposed to do. Again, (A LEFT JOIN B) produces different results, 
in general, than (B LEFT JOIN A).


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Query Indexes

2007-08-02 Thread Igor Tandetnik

Mitchell Vincent <[EMAIL PROTECTED]>
wrote:

Is there any way to determine if a query is using an index or not? In
PostgreSQL the "explain" works to tell whether a table is being
sequentially scanned or not..


Prepend the query with EXPLAIN QUERY PLAN


I have a query :

SELECT *,(total - balance_due) as total_paid FROM invoice_master WHERE
lower(invoice_number)  LIKE lower('%%')  AND status != 'Void'  AND
status != 'Recur'  AND status != 'Paid' AND status != 'Forwarded'
ORDER BY created ASC  LIMIT 25

The lower('%%') gets used with whatever field the user is searching
on.

I have indexes on created, status and invoice_number - but apparently
I can't make an index on lower(invoice_number) -- can I?


You can't. You can, however, create an index on invoice_number with 
COLLATE NOCASE clause.


Note that condition "lower(invoice_number)  LIKE lower('%%')" is always 
true - any string matches this pattern. What precisely is this supposed 
to achieve, and how an index on lower(invoice_number) is expected to 
help here?


A condition on status can be more compactly written as

   status NOT IN ('Void', 'Recur', 'Paid', 'Forwarded')

The query as written should use an index on "created" to implement ORDER 
BY clause.


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Query

2006-11-29 Thread Igor Tandetnik

[EMAIL PROTECTED] wrote:

"SELECT  * from database where STRING1 = 'x' && STRING2 = 'y' &&
STATE = 1 && purpose = 'z'"

To achive this whether I need to attach given INDEX with the table so
that
during INSERT it will be inserted in a sorted order
"Create INDEX  INDX on database( STRING1, STRING2, state, purpose)"

i)
   If above index is present then the SELECT statement given,
iterates to the all rows present in the table or it directly jumps to
the
block where entries satisfying STRING1 = 'x'  ??


It "directly jumps" to records satisfying all four conditions. That's 
why you built an index on all four columns after all.



ii)

   Whether INDEXInng based on String( 12 char) gives performace
problem???


Any index is a tradeoff: you speed up selects but slow down inserts and 
updates. There is no problem with indexing strings specifically, if 
that's what you are asking.



iii)

   Whether this insertion will be like LINK LIST ( fast) or like
Array(slow)??


SQLite implements indexes as B-trees. Inserting a new record is O(log N) 
operation.




iv)
   Iis there any other way to call Select statement with less
performance issue?


The index you built gives best possible performance for this particular 
query. Whether the tradeoff was worth it only you can decide.



Query 2)
What will be the performance issue if I create INDEX with PATH( Long
text)
in ASC or DSC mode for the above table


None whatsoever - see http://sqlite.org/lang_createindex.html. SQLite 
ignores ASC and DESC modifiers and always builds an index in ascending 
order. However, a B-tree can be equally easily walked backwards as 
forwards, so something like ORDER BY Path DESC can still use the index.


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Query returns all rows

2006-09-25 Thread Mark Richards

Hmm..

I'm so used to doing this via php's sqlite interface:

SELECT * FROM blah WHERE id = "12345";

"form" in double quotes is an alternative way to refer to FORM column - 
it is _not_ a string literal. So your query condition is WHERE 
FORM=FORM, which is of course always true (except maybe when FORM is NULL).


A string literal should be enclosed in single quotes, as in 'form'.

Igor Tandetnik

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Query returns all rows

2006-09-25 Thread Igor Tandetnik

David Champagne
 wrote:

If I have a table defined in the following manner

   CREATE TABLE License (Form varchar(256), Type int, NbOcc int)

and then I execute a query

   SELECT * FROM License WHERE FORM = "form";

I get all rows returned


"form" in double quotes is an alternative way to refer to FORM column - 
it is _not_ a string literal. So your query condition is WHERE 
FORM=FORM, which is of course always true (except maybe when FORM is 
NULL).


A string literal should be enclosed in single quotes, as in 'form'.

Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Query performance issues - index selection

2006-06-05 Thread Steve Green

Joe,

Thanks for the info...

Unfortunately, we don't have the option of moving back to v3.2.1.  However,
I'm curious as well to see the difference in performance, so when (if?) I
get my current performance issues under control, I'll run the test and
post the results.

Steve

Joe Wilson wrote:

If you test against 3.2.1, just use your original where clause 
and original index pk_data:


 where utime >= 1146441600 and utime < 114912

as I don't know what effect the unary +'s would have on it.

--- Joe Wilson <[EMAIL PROTECTED]> wrote:


For my databases, GROUP BY is slower in recent SQLite releases
because my queries are not able to use indexed columns for 
GROUP BY items by design:


http://www.mail-archive.com/sqlite-users%40sqlite.org/msg15563.html

I'm curious what sort of timings you'd get with SQLite 3.2.1.
http://sqlite.org/sqlite-3.2.1.tar.gz

To convert to the older database format:

sqlite334 334.db .dump | sqlite321 321.db

And then perform your timings with both SQLite versions against 
the 321.db file to level the playing field.

Just having an index on utime should be optimal for SQLite 3.2.1.

--- Steve Green <[EMAIL PROTECTED]> wrote:


Using Richard's suggestion of changing the where clause of my query
to

where +utime >= 1146441600 and +utime < 114912000

did force sqlite to use the index that gave better performance.

However, I'm seeing some strange behavior that I'm hoping someone can
shed some light on.

With the time period mentioned below, the data set is about 2.5 million
rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
the query time was reduced from 13.5 minutes to 26 seconds (not great,
but at least the browser won't timeout waiting for a response).

However, with a different time period, I have a much smaller data set of
about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
utime), the query still takes about 20 seconds.  However, if the primary
index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
Unfortunately at query time I have no idea of knowing how much data is
going to have to be traversed, so the idea of modifying the query to force
the use of different indexes is not possible.  Can anyone explain why the
performance is so poor with the smaller data set and the "optimal" index.
Any suggestions on a workaround so that optimal performance can be achieved
with all data set sizes?

Thanks for your help,

Steve

Steve Green wrote:



Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:



Hi,

I have the following schema

CREATE TABLE data(
   utime int4,
   r_id int2,
   u_id int4,
   data1 int8,
   data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  
To test

the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an 
explain query

plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance 
improvement
over the unique index, although sqlite's query planner chose these 
indexes over

the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index 
without having
to remove my unique index?  Perhaps this would involve rewritting the 
query, but

I'm at a loss as to how that could be done.

Thanks for your time,

Steve




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the 

Re: [sqlite] Re: Query performance issues - index selection

2006-06-03 Thread Joe Wilson
If you test against 3.2.1, just use your original where clause 
and original index pk_data:

 where utime >= 1146441600 and utime < 114912

as I don't know what effect the unary +'s would have on it.

--- Joe Wilson <[EMAIL PROTECTED]> wrote:
> For my databases, GROUP BY is slower in recent SQLite releases
> because my queries are not able to use indexed columns for 
> GROUP BY items by design:
> 
>  http://www.mail-archive.com/sqlite-users%40sqlite.org/msg15563.html
> 
> I'm curious what sort of timings you'd get with SQLite 3.2.1.
> http://sqlite.org/sqlite-3.2.1.tar.gz
> 
> To convert to the older database format:
> 
>  sqlite334 334.db .dump | sqlite321 321.db
> 
> And then perform your timings with both SQLite versions against 
> the 321.db file to level the playing field.
> Just having an index on utime should be optimal for SQLite 3.2.1.
> 
> --- Steve Green <[EMAIL PROTECTED]> wrote:
> > Using Richard's suggestion of changing the where clause of my query
> > to
> > 
> > where +utime >= 1146441600 and +utime < 114912000
> > 
> > did force sqlite to use the index that gave better performance.
> > 
> > However, I'm seeing some strange behavior that I'm hoping someone can
> > shed some light on.
> > 
> > With the time period mentioned below, the data set is about 2.5 million
> > rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
> > the query time was reduced from 13.5 minutes to 26 seconds (not great,
> > but at least the browser won't timeout waiting for a response).
> > 
> > However, with a different time period, I have a much smaller data set of
> > about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
> > utime), the query still takes about 20 seconds.  However, if the primary
> > index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
> > Unfortunately at query time I have no idea of knowing how much data is
> > going to have to be traversed, so the idea of modifying the query to force
> > the use of different indexes is not possible.  Can anyone explain why the
> > performance is so poor with the smaller data set and the "optimal" index.
> > Any suggestions on a workaround so that optimal performance can be achieved
> > with all data set sizes?
> > 
> > Thanks for your help,
> > 
> > Steve
> > 
> > Steve Green wrote:
> > 
> > > Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
> > > v7.3
> > > 
> > > Steve
> > > 
> > > Steve Green wrote:
> > > 
> > >> Hi,
> > >>
> > >> I have the following schema
> > >>
> > >> CREATE TABLE data(
> > >> utime int4,
> > >> r_id int2,
> > >> u_id int4,
> > >> data1 int8,
> > >> data2 int8
> > >> );
> > >>
> > >> Each row is uniquely defined by utime, r_id, and u_id, so I have the
> > >> following index
> > >>
> > >> CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
> > >>
> > >> This index is also needed because at update time, I use "insert or
> > >> replace" to update the data table.
> > >>
> > >> The type of query that I'm attempting to perform is similar to
> > >>
> > >> select u_id, sum( data1 ), sum( data2 )
> > >> from data where utime >= 1146441600 and utime < 114912
> > >> group by u_id
> > >> order by sum( data1 ) desc
> > >> limit 10
> > >>
> > >> My current table has about 2.5 million rows and about 86,000 distinct
> > >> u_id values in the time period selected, and the query takes about 13.5
> > >> minutes.
> > >>
> > >> Performing an explain query plan reveals
> > >>
> > >> 0|0|TABLE data WITH INDEX pk_data
> > >>
> > >> so the primary index is being used...
> > >>
> > >> Based on some past experiences, I added the following index to the table
> > >>
> > >> CREATE INDEX ix_data_ut on data( u_id, utime );
> > >>
> > >> Note that the utime is the last parameter in the index.  With the primary
> > >> index in place, I was not able to convince sqlite to use this index.  
> > >> To test
> > >> the index, I was forced to drop the primary index (which I can't do in my
> > >> production environment).   After dropping the primary index, an 
> > >> explain query
> > >> plan revealed
> > >>
> > >> 0|0|TABLE data WITH INDEX ix_data_ut
> > >>
> > >> and the query ran in 26 seconds...
> > >>
> > >> Subsequent tests using the following indexes provided no performance 
> > >> improvement
> > >> over the unique index, although sqlite's query planner chose these 
> > >> indexes over
> > >> the unique index
> > >>
> > >> CREATE INDEX ix_data_tu on data( utime, u_id );
> > >> CREATE INDEX ix_data_t on data( utime );
> > >>
> > >> So, is there a way that I can get sqlite to use the optimal index 
> > >> without having
> > >> to remove my unique index?  Perhaps this would involve rewritting the 
> > >> query, but
> > >> I'm at a loss as to how that could be done.
> > >>
> > >> Thanks for your time,
> > >>
> > >> Steve


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

Re: [sqlite] Re: Query performance issues - index selection

2006-06-03 Thread Joe Wilson
For my databases, GROUP BY is slower in recent SQLite releases
because my queries are not able to use indexed columns for 
GROUP BY items by design:

 http://www.mail-archive.com/sqlite-users%40sqlite.org/msg15563.html

I'm curious what sort of timings you'd get with SQLite 3.2.1.
http://sqlite.org/sqlite-3.2.1.tar.gz

To convert to the older database format:

 sqlite334 334.db .dump | sqlite321 321.db

And then perform your timings with both SQLite versions against 
the 321.db file to level the playing field.
Just having an index on utime should be optimal for SQLite 3.2.1.

--- Steve Green <[EMAIL PROTECTED]> wrote:
> Using Richard's suggestion of changing the where clause of my query
> to
> 
> where +utime >= 1146441600 and +utime < 114912000
> 
> did force sqlite to use the index that gave better performance.
> 
> However, I'm seeing some strange behavior that I'm hoping someone can
> shed some light on.
> 
> With the time period mentioned below, the data set is about 2.5 million
> rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
> the query time was reduced from 13.5 minutes to 26 seconds (not great,
> but at least the browser won't timeout waiting for a response).
> 
> However, with a different time period, I have a much smaller data set of
> about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
> utime), the query still takes about 20 seconds.  However, if the primary
> index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
> Unfortunately at query time I have no idea of knowing how much data is
> going to have to be traversed, so the idea of modifying the query to force
> the use of different indexes is not possible.  Can anyone explain why the
> performance is so poor with the smaller data set and the "optimal" index.
> Any suggestions on a workaround so that optimal performance can be achieved
> with all data set sizes?
> 
> Thanks for your help,
> 
> Steve
> 
> Steve Green wrote:
> 
> > Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
> > v7.3
> > 
> > Steve
> > 
> > Steve Green wrote:
> > 
> >> Hi,
> >>
> >> I have the following schema
> >>
> >> CREATE TABLE data(
> >> utime int4,
> >> r_id int2,
> >> u_id int4,
> >> data1 int8,
> >> data2 int8
> >> );
> >>
> >> Each row is uniquely defined by utime, r_id, and u_id, so I have the
> >> following index
> >>
> >> CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
> >>
> >> This index is also needed because at update time, I use "insert or
> >> replace" to update the data table.
> >>
> >> The type of query that I'm attempting to perform is similar to
> >>
> >> select u_id, sum( data1 ), sum( data2 )
> >> from data where utime >= 1146441600 and utime < 114912
> >> group by u_id
> >> order by sum( data1 ) desc
> >> limit 10
> >>
> >> My current table has about 2.5 million rows and about 86,000 distinct
> >> u_id values in the time period selected, and the query takes about 13.5
> >> minutes.
> >>
> >> Performing an explain query plan reveals
> >>
> >> 0|0|TABLE data WITH INDEX pk_data
> >>
> >> so the primary index is being used...
> >>
> >> Based on some past experiences, I added the following index to the table
> >>
> >> CREATE INDEX ix_data_ut on data( u_id, utime );
> >>
> >> Note that the utime is the last parameter in the index.  With the primary
> >> index in place, I was not able to convince sqlite to use this index.  
> >> To test
> >> the index, I was forced to drop the primary index (which I can't do in my
> >> production environment).   After dropping the primary index, an 
> >> explain query
> >> plan revealed
> >>
> >> 0|0|TABLE data WITH INDEX ix_data_ut
> >>
> >> and the query ran in 26 seconds...
> >>
> >> Subsequent tests using the following indexes provided no performance 
> >> improvement
> >> over the unique index, although sqlite's query planner chose these 
> >> indexes over
> >> the unique index
> >>
> >> CREATE INDEX ix_data_tu on data( utime, u_id );
> >> CREATE INDEX ix_data_t on data( utime );
> >>
> >> So, is there a way that I can get sqlite to use the optimal index 
> >> without having
> >> to remove my unique index?  Perhaps this would involve rewritting the 
> >> query, but
> >> I'm at a loss as to how that could be done.
> >>
> >> Thanks for your time,
> >>
> >> Steve


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


[sqlite] Re: Query performance issues - index selection

2006-06-02 Thread Steve Green

Using Richard's suggestion of changing the where clause of my query
to

where +utime >= 1146441600 and +utime < 114912000

did force sqlite to use the index that gave better performance.

However, I'm seeing some strange behavior that I'm hoping someone can
shed some light on.

With the time period mentioned below, the data set is about 2.5 million
rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
the query time was reduced from 13.5 minutes to 26 seconds (not great,
but at least the browser won't timeout waiting for a response).

However, with a different time period, I have a much smaller data set of
about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
utime), the query still takes about 20 seconds.  However, if the primary
index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
Unfortunately at query time I have no idea of knowing how much data is
going to have to be traversed, so the idea of modifying the query to force
the use of different indexes is not possible.  Can anyone explain why the
performance is so poor with the smaller data set and the "optimal" index.
Any suggestions on a workaround so that optimal performance can be achieved
with all data set sizes?

Thanks for your help,

Steve

Steve Green wrote:


Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:


Hi,

I have the following schema

CREATE TABLE data(
utime int4,
r_id int2,
u_id int4,
data1 int8,
data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  
To test

the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an 
explain query

plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance 
improvement
over the unique index, although sqlite's query planner chose these 
indexes over

the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index 
without having
to remove my unique index?  Perhaps this would involve rewritting the 
query, but

I'm at a loss as to how that could be done.

Thanks for your time,

Steve





--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


[sqlite] Re: Query performance issues - index selection

2006-06-01 Thread Steve Green

Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:


Hi,

I have the following schema

CREATE TABLE data(
utime int4,
r_id int2,
u_id int4,
data1 int8,
data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  To 
test

the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an explain 
query

plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance 
improvement
over the unique index, although sqlite's query planner chose these 
indexes over

the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index 
without having
to remove my unique index?  Perhaps this would involve rewritting the 
query, but

I'm at a loss as to how that could be done.

Thanks for your time,

Steve


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Re: query problem

2005-09-20 Thread D. Richard Hipp
On Tue, 2005-09-20 at 06:57 +0200, Alain Bertrand wrote:
> Do you have an approximativ dateframe to see 
> this fix incorporated in a release ?
> 

No.  The problem is obscure so I am not in a big rush
to get it out.  You can fix the problem in your copy by
downloading the latest from CVS.  Or you can get the patches
directly from http://www.sqlite.org/cvstrac/chngview?cn=2725.
-- 
D. Richard Hipp <[EMAIL PROTECTED]>



[sqlite] Re: query problem

2005-09-19 Thread Alain Bertrand

D. Richard Hipp a écrit :




I've changed my mind.  I think instead that there is a bug in
SQLite that caused LEFT JOINs to be computed incorrectly if one
of the terms in the ON clause restricts only the left table in
the join.  


Check-in [2725] at http://www.sqlite.org/cvstrac/chngview?cn=2725
contains
a fix for this problem for version 3.x.  The problem has existed in
SQLite
forever (because it originates from a conceptual misunderstanding by the
code author :-)) so version 2.8.16 is still broken.  Because the problem
is obscure, I am not inclined to fix it in the 2.8.x series...

Thanks for your answer. Do you have an approximativ dateframe to see 
this fix incorporated in a release ?



Alain



Re: [sqlite] Re: query problem

2005-09-19 Thread Puneet Kishor

Alain Bertrand wrote:

Puneet Kishor a écrit :


without seeing the data, a good guess would be, "Yes."


What do you mean ? That my query is syntaxically wrong ?


Your query is syntactically correct in that it returns a result without 
the SQL engine generating a syntax error... it just doesn't seem to be 
producing what you want, hence it is perhaps logically incorrect, non!.






LEFT JOIN selects all the rows with values from the left table and 
either matching values or NULL from the right table.


Yes, I know, not very well because I have only a basic knowledge of sql 
(this query has been written by a friend), but this is what I want.
This may look complicated for just counting the number of rows, but in 
fact this a kind of minimum example to reproduce the problem I have with 
a more complex query.



If you just had JOIN then you would get only those rows where both left 
and right sides match (the result set would have fewer records than...) 
if you have LEFT (or RIGHT) JOIN whereby you will get all the rows where 
the left side matches, and NULL for where right side doesn't match (the 
result set would have more records).



Try SQLzoo.net for very nice tutorials with basic to intermediate SQL.


[sqlite] Re: query problem

2005-09-19 Thread Alain Bertrand

Puneet Kishor a écrit :


without seeing the data, a good guess would be, "Yes."

What do you mean ? That my query is syntaxically wrong ?


LEFT JOIN selects all the rows with values from the left table and 
either matching values or NULL from the right table.
Yes, I know, not very well because I have only a basic knowledge of sql 
(this query has been written by a friend), but this is what I want.
This may look complicated for just counting the number of rows, but in 
fact this a kind of minimum example to reproduce the problem I have with 
a more complex query.


Regards,
Alain



[sqlite] Re: query 31 days

2005-08-12 Thread Tom Deblauwe

Ramon wrote:

Have you tried putting the yearmonth in 2 different columns, not sure if it 
will be
any faster, but I think it will, due to being a lot of them repeated.
Check you bandwidth and see if it's enough for your web users to access this 
query
remotely.
Peace



I will try this, thanx for the tip.

Kind regards,
Tom,




[sqlite] Re: query 31 days

2005-08-12 Thread Tom Deblauwe

Tom Deblauwe wrote:

Hello,


[snip]speed problem[/snip]


For the purpose of testing the existance of a record, it seems that
using this statement:

select date from mytable where yearmonth = 200508 and
date = 1 limit 1;

is slower than this:

select 1 from mytable where yearmonth = 200508 and
date = 1 limit 1;

This is because no lookup for the column 'date' in the main table is
done in the second case, which reduced the query from 2.6 seconds to 0.95

On a sidenote: maybe this can be put in the documentation, optimizing
queries?  Also maybe something to put there: queries go a LOT faster
when not sorting using 'order by'.  Maybe this is obvious, but it
wouldn't hurt to put it on that page, because I hadn't thought of it at
first.

Thanks for your time,
kind regards,
Tom,