Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-20 Thread Rob Willett

Just to close this thread down completely.

We've implemented the changes on our production server and the actual 
performance increase is even better than we predicted.


Our production server is a containerised Ubuntu Server. Its hardware 
configuration is significantly different to our Macs on which we 
develop. Our Macs have more RAM and SSD disks. The production server has 
other advantages though :)


The query took 90 mins on our production server, this is mainly due to 
slower disks than we have locally, so this was not unexpected.


After doing the performance changes (removing unneeded fields and adding 
in a covering index), we went down to 38 secs.


This has meant we are no longer looking at a second database just for 
analytics, we just needed to learn to use the database we already had :)


Rob

On 18 Mar 2017, at 5:48, Rob Willett wrote:

We've just implemented a covering index for the last step (is it 
really?) in our quest to get the query execution time down.


To summarise we have gone from 32 mins to 16 mins by updating an index 
so it doesn't use collate, we took another six minutes off by removing 
extra fields in the select we didn't need.


We have just created a new index which 'covers' all the fields we use 
in the select, this means (and I paraphrase) that we use the index to 
get all the data and there is no need to read from the database.


Well that was a bit of a surprise, the index creation took 45 mins, we 
ran the program again and thought, rats, we've cocked it up, it only 
took 54 secs, we got something wrong. So we checked it and checked 
again and we hasn't got anything wrong. Our query has moved from 32 
mins to 54 secs.


We're quite happy with that performance increase. In fact we're 
delighted, so thanks for all the help in getting us to this stage.


We have kept copies of the query planner bytecode output if anybody is 
interested. Gunter has had copies, but if anybody else would like 
them, please ask.


Many thanks again for all the help,

Rob

On 17 Mar 2017, at 22:12, Rob Willett wrote:


Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated 
index that doesn't use COLLATE. Thats 32mins to 16mins.


2. We then shaved a further six minutes off the execution time by 
removing extraneous fields in the select statement, so instead of 
"select * ...", we identified which fields we used and directly 
selected those. So we are now down to 10 mins or 1/3 of when we 
started for, to be honest, virtually no extra work, merely being 
smarter, or rather you being smarter.


3. We have looked through all our indexes and can see that every 
index has a COLLATE against it, even if the column is an integer. We 
have raised a support call with Navicat.


4. The next step is to create a "covering index" to try and get the 
whole of the query into the index. However its 22:11 in London and I 
need to get home.


Thanks very much for the help so far. Tomorrow is more tricky but 
I'll read up on covering indexes to see how to use them,.


Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

On 17 Mar 2017, at 6:22pm, Rob Willett 
 wrote:


4. Work through returning just the columns we actually need from 
our queries. We have a recollection that if we can build an index 
with all the information necessary in it, we can do all the work in 
joins rather than paging out to disk. Is this what you are 
referring to?


It works only where all the columns you need to read are in the same 
table.  The ideal form of a covering index is to have the columns 
listed in this order:


1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is 
available from the index, so it doesn’t bother to read the table 
at all.  This can lead to something like a doubling of speed.  Of 
course, you sacrifice filespace, and making changes to the table 
takes a little longer.


5. Sleep (not exactly sure when) and watch three international 
rugby games tomorrow.


Sleep while waiting for indexes to be created and ANALYZE to work.  
May you see skilled players, creative moves and dramatic play.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-20 Thread Rob Willett

David,

We're English and masters of the understatement ;)

Rob

On 20 Mar 2017, at 15:04, David Raymond wrote:


"... Our query has moved from 32 mins to 54 secs.

We're quite happy with that performance increase."

I have to admit that the hearty chuckle which that statement produced 
from me hurt my sore throat. Totally worth it though.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-20 Thread David Raymond
"... Our query has moved from 32 mins to 54 secs.

We're quite happy with that performance increase."

I have to admit that the hearty chuckle which that statement produced from me 
hurt my sore throat. Totally worth it though.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-19 Thread Paul Sanderson
Oops - somehow misread the last message - 54 seconds down from 32 minutes -
that's a result!

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 19 March 2017 at 12:07, Paul Sanderson 
wrote:

> What is the average size of the text in the direction field? and what page
> size have you set for the database? If the size of a record is such that
> only a small handful fit into a page, or worse each record overflows (and
> your select includes the direction field) then this could impact
> performance.
>
> Paul
> www.sandersonforensics.com
> skype: r3scue193
> twitter: @sandersonforens
> Tel +44 (0)1326 572786 <+44%201326%20572786>
> http://sandersonforensics.com/forum/content.php?195-SQLite-
> Forensic-Toolkit -Forensic Toolkit for SQLite
> email from a work address for a fully functional demo licence
>
> On 18 March 2017 at 05:48, Rob Willett 
> wrote:
>
>> We've just implemented a covering index for the last step (is it really?)
>> in our quest to get the query execution time down.
>>
>> To summarise we have gone from 32 mins to 16 mins by updating an index so
>> it doesn't use collate, we took another six minutes off by removing extra
>> fields in the select we didn't need.
>>
>> We have just created a new index which 'covers' all the fields we use in
>> the select, this means (and I paraphrase) that we use the index to get all
>> the data and there is no need to read from the database.
>>
>> Well that was a bit of a surprise, the index creation took 45 mins, we
>> ran the program again and thought, rats, we've cocked it up, it only took
>> 54 secs, we got something wrong. So we checked it and checked again and we
>> hasn't got anything wrong. Our query has moved from 32 mins to 54 secs.
>>
>> We're quite happy with that performance increase. In fact we're
>> delighted, so thanks for all the help in getting us to this stage.
>>
>> We have kept copies of the query planner bytecode output if anybody is
>> interested. Gunter has had copies, but if anybody else would like them,
>> please ask.
>>
>> Many thanks again for all the help,
>>
>> Rob
>>
>>
>> On 17 Mar 2017, at 22:12, Rob Willett wrote:
>>
>> Dear all,
>>>
>>> We progress steadily forward.
>>>
>>> 1. We immediately halved our execution time by moving to an updated
>>> index that doesn't use COLLATE. Thats 32mins to 16mins.
>>>
>>> 2. We then shaved a further six minutes off the execution time by
>>> removing extraneous fields in the select statement, so instead of "select *
>>> ...", we identified which fields we used and directly selected those. So we
>>> are now down to 10 mins or 1/3 of when we started for, to be honest,
>>> virtually no extra work, merely being smarter, or rather you being smarter.
>>>
>>> 3. We have looked through all our indexes and can see that every index
>>> has a COLLATE against it, even if the column is an integer. We have raised
>>> a support call with Navicat.
>>>
>>> 4. The next step is to create a "covering index" to try and get the
>>> whole of the query into the index. However its 22:11 in London and I need
>>> to get home.
>>>
>>> Thanks very much for the help so far. Tomorrow is more tricky but I'll
>>> read up on covering indexes to see how to use them,.
>>>
>>> Rob
>>>
>>> On 17 Mar 2017, at 18:39, Simon Slavin wrote:
>>>
>>> On 17 Mar 2017, at 6:22pm, Rob Willett 
 wrote:

 4. Work through returning just the columns we actually need from our
> queries. We have a recollection that if we can build an index with all the
> information necessary in it, we can do all the work in joins rather than
> paging out to disk. Is this what you are referring to?
>

 It works only where all the columns you need to read are in the same
 table.  The ideal form of a covering index is to have the columns listed in
 this order:

 1) columns needed for the WHERE clause
 2) columns needed for the ORDER BY clause which aren’t in (1)
 3) columns needed to be read which aren’t in (2) or (1)

 SQLite detects that all the information it needs for the SELECT is
 available from the index, so it doesn’t bother to read the table at all.
 This can lead to something like a doubling of speed.  Of course, you
 sacrifice filespace, and making changes to the table takes a little longer.

 5. Sleep (not exactly sure when) and watch three international rugby
> games tomorrow.
>

 Sleep while waiting for indexes to be created and ANALYZE to work.  May
 you see skilled players, creative moves and dramatic play.

 Simon.
 ___
 sqlite-users mailing list
 

Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-19 Thread Paul Sanderson
What is the average size of the text in the direction field? and what page
size have you set for the database? If the size of a record is such that
only a small handful fit into a page, or worse each record overflows (and
your select includes the direction field) then this could impact
performance.

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 18 March 2017 at 05:48, Rob Willett  wrote:

> We've just implemented a covering index for the last step (is it really?)
> in our quest to get the query execution time down.
>
> To summarise we have gone from 32 mins to 16 mins by updating an index so
> it doesn't use collate, we took another six minutes off by removing extra
> fields in the select we didn't need.
>
> We have just created a new index which 'covers' all the fields we use in
> the select, this means (and I paraphrase) that we use the index to get all
> the data and there is no need to read from the database.
>
> Well that was a bit of a surprise, the index creation took 45 mins, we ran
> the program again and thought, rats, we've cocked it up, it only took 54
> secs, we got something wrong. So we checked it and checked again and we
> hasn't got anything wrong. Our query has moved from 32 mins to 54 secs.
>
> We're quite happy with that performance increase. In fact we're delighted,
> so thanks for all the help in getting us to this stage.
>
> We have kept copies of the query planner bytecode output if anybody is
> interested. Gunter has had copies, but if anybody else would like them,
> please ask.
>
> Many thanks again for all the help,
>
> Rob
>
>
> On 17 Mar 2017, at 22:12, Rob Willett wrote:
>
> Dear all,
>>
>> We progress steadily forward.
>>
>> 1. We immediately halved our execution time by moving to an updated index
>> that doesn't use COLLATE. Thats 32mins to 16mins.
>>
>> 2. We then shaved a further six minutes off the execution time by
>> removing extraneous fields in the select statement, so instead of "select *
>> ...", we identified which fields we used and directly selected those. So we
>> are now down to 10 mins or 1/3 of when we started for, to be honest,
>> virtually no extra work, merely being smarter, or rather you being smarter.
>>
>> 3. We have looked through all our indexes and can see that every index
>> has a COLLATE against it, even if the column is an integer. We have raised
>> a support call with Navicat.
>>
>> 4. The next step is to create a "covering index" to try and get the whole
>> of the query into the index. However its 22:11 in London and I need to get
>> home.
>>
>> Thanks very much for the help so far. Tomorrow is more tricky but I'll
>> read up on covering indexes to see how to use them,.
>>
>> Rob
>>
>> On 17 Mar 2017, at 18:39, Simon Slavin wrote:
>>
>> On 17 Mar 2017, at 6:22pm, Rob Willett 
>>> wrote:
>>>
>>> 4. Work through returning just the columns we actually need from our
 queries. We have a recollection that if we can build an index with all the
 information necessary in it, we can do all the work in joins rather than
 paging out to disk. Is this what you are referring to?

>>>
>>> It works only where all the columns you need to read are in the same
>>> table.  The ideal form of a covering index is to have the columns listed in
>>> this order:
>>>
>>> 1) columns needed for the WHERE clause
>>> 2) columns needed for the ORDER BY clause which aren’t in (1)
>>> 3) columns needed to be read which aren’t in (2) or (1)
>>>
>>> SQLite detects that all the information it needs for the SELECT is
>>> available from the index, so it doesn’t bother to read the table at all.
>>> This can lead to something like a doubling of speed.  Of course, you
>>> sacrifice filespace, and making changes to the table takes a little longer.
>>>
>>> 5. Sleep (not exactly sure when) and watch three international rugby
 games tomorrow.

>>>
>>> Sleep while waiting for indexes to be created and ANALYZE to work.  May
>>> you see skilled players, creative moves and dramatic play.
>>>
>>> Simon.
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett
We've just implemented a covering index for the last step (is it 
really?) in our quest to get the query execution time down.


To summarise we have gone from 32 mins to 16 mins by updating an index 
so it doesn't use collate, we took another six minutes off by removing 
extra fields in the select we didn't need.


We have just created a new index which 'covers' all the fields we use in 
the select, this means (and I paraphrase) that we use the index to get 
all the data and there is no need to read from the database.


Well that was a bit of a surprise, the index creation took 45 mins, we 
ran the program again and thought, rats, we've cocked it up, it only 
took 54 secs, we got something wrong. So we checked it and checked again 
and we hasn't got anything wrong. Our query has moved from 32 mins to 54 
secs.


We're quite happy with that performance increase. In fact we're 
delighted, so thanks for all the help in getting us to this stage.


We have kept copies of the query planner bytecode output if anybody is 
interested. Gunter has had copies, but if anybody else would like them, 
please ask.


Many thanks again for all the help,

Rob

On 17 Mar 2017, at 22:12, Rob Willett wrote:


Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated 
index that doesn't use COLLATE. Thats 32mins to 16mins.


2. We then shaved a further six minutes off the execution time by 
removing extraneous fields in the select statement, so instead of 
"select * ...", we identified which fields we used and directly 
selected those. So we are now down to 10 mins or 1/3 of when we 
started for, to be honest, virtually no extra work, merely being 
smarter, or rather you being smarter.


3. We have looked through all our indexes and can see that every index 
has a COLLATE against it, even if the column is an integer. We have 
raised a support call with Navicat.


4. The next step is to create a "covering index" to try and get the 
whole of the query into the index. However its 22:11 in London and I 
need to get home.


Thanks very much for the help so far. Tomorrow is more tricky but I'll 
read up on covering indexes to see how to use them,.


Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

On 17 Mar 2017, at 6:22pm, Rob Willett  
wrote:


4. Work through returning just the columns we actually need from our 
queries. We have a recollection that if we can build an index with 
all the information necessary in it, we can do all the work in joins 
rather than paging out to disk. Is this what you are referring to?


It works only where all the columns you need to read are in the same 
table.  The ideal form of a covering index is to have the columns 
listed in this order:


1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is 
available from the index, so it doesn’t bother to read the table at 
all.  This can lead to something like a doubling of speed.  Of 
course, you sacrifice filespace, and making changes to the table 
takes a little longer.


5. Sleep (not exactly sure when) and watch three international rugby 
games tomorrow.


Sleep while waiting for indexes to be created and ANALYZE to work.  
May you see skilled players, creative moves and dramatic play.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Simon Slavin

On 17 Mar 2017, at 10:12pm, Rob Willett  wrote:

> 3. We have looked through all our indexes and can see that every index has a 
> COLLATE against it, even if the column is an integer. We have raised a 
> support call with Navicat.

This might also be something that Navicat should look into.  The most efficient 
way to implement COLLATE in SQLite is to do it in the column definition.  At 
the moment, for example, the table is defined as

> CREATE TABLE "Disruptions" (
>"id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>[…]
>"Direction" TEXT
> );

.  If instead it said

CREATE TABLE "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 […]
 "Direction" TEXT COLLATE NOCASE
);

then all comparisons and indexes which mentioned "Direction" would 
automatically use NOCASE without having to have COLLATE NOCASE mentioned in 
them.  This is almost always The Right Thing, and what the programmer would 
want.  It simplifies all the other SQL commands used by the program.  And since 
the collation on the index then matches the collation for the column 
definition, SQLite has to do less work every time it deals with the index.

Unfortunately, this change cannot be made with backward compatibility.  
Changing the table definition would involve remaking the table, probably by 
defining a new table with the new definition, copying the data across, deleting 
the original table, and renaming the new one.  If backward compatibility is 
important in support of the program then this may be a deal-breaker and one 
could understand why the developer team won’t make the change.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Keith Medcalf
On Friday, 17 March, 2017 16:13, Rob Willett  
said:

> 3. We have looked through all our indexes and can see that every index
> has a COLLATE against it, even if the column is an integer. We have
> raised a support call with Navicat.

If you want a table field to be non-case-sensitive, the proper place to declare 
that field as such is when you define the table.

create table Words (Word text unique collate nocase);

means that everywhere that you use Wowrds.Word is not case sensitive (but it is 
case preserving).  End of Story.

create table Words (Word text);
create unique index sqlite_autoindex_Words_1 on Words (Word collate nocase);

is however an entirely different thing.  The data in Words.Word is case 
sensitive, and the index is not case sensitive.  Thus query will operate 
differently depending on how you declared your table and index:

select * from Words where word = 'Apple';

if the former (nocase collation on the table) will (a) use the index and (b) be 
equivalent to where lower(word) = lower('Apple') (that is, case insensitive)

in the case where you have declared a collate nocase index, the index will NOT 
BE USED unless you specify COLLATE NOCASE in the where clause.  By default the 
search will be a table scan that is case sensitive.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Keith Medcalf

Change your query to

explain query plan select * from Disruptions where status = 2 collate nocase OR 
status = 6 collate nocase;

to match your index.

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Rob Willett
> Sent: Friday, 17 March, 2017 04:20
> To: SQLite mailing list
> Subject: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
> 
> Hi,
> 
> We've been struggling with a complex query that we have written. One of
> the elements of this complex query is a select statement that doesn't
> appear to use an index when we *think* it should do. We're not going to
> ask you to debug a large complex SQL query (unless you have nothing else
> to do today) but we're working our way through the query trying to
> understand where we've gone wrong. So we've broken down the query and
> are trying each section to see what it performs (or rather doesn't
> perform like).
> 
> The sub query is a simple select on a large table, Disruptions. The
> Disruptions table has 180M rows of data.
> 
> The schema for it is here. We've pulled it straight from Navicat for
> SQLite.
> 
> ```
> CREATE TABLE "Disruptions" (
>"id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>"version" integer NOT NULL,
>"Disruption_id" INTEGER NOT NULL,
>"status" integer NOT NULL,
>"severity" integer NOT NULL,
>"levelOfInterest" integer NOT NULL,
>"category" integer NOT NULL,
>"subCategory" integer NOT NULL,
>"startTime" TEXT NOT NULL,
>"endTime" text NOT NULL,
>"location" integer NOT NULL,
>"corridor" integer NOT NULL,
>"comments" integer NOT NULL,
>"currentUpdate" integer NOT NULL,
>"remarkTime" TEXT NOT NULL,
>"lastModTime" TEXT NOT NULL,
>"CauseAreaPointX" real NOT NULL,
>"CauseAreaPointY" real NOT NULL,
>"Direction" TEXT
> );
> INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions",
> '184626834');
> 
> -- 
> --  Indexes structure for table Disruptions
> -- 
> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC,
> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE
> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
> NOCASE ASC);
> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE
> ASC);
> 
> PRAGMA foreign_keys = true;
> ```
> 
> As part of the larger more complex query, we are executing the query
> 
> ```
> select * from Disruptions where status = 2 OR status = 6;
> ```
> 
> Status is one of six values, 1 to 6 and is probably not evenly
> distributed across the 180M rows.
> 
> If we do
> 
> ```
> sqlite> explain query plan select * from Disruptions where status = 2 OR
> status = 6;
> selectid|order|from|detail
> 0|0|0|SCAN TABLE Disruptions
> ```
> 
> We can see that table scanning a 180M records is going to be slow, no
> matter what the rest of the complex query is like.
> 
> We have an index Disruptions_idx4 which we *think* should speed it up,
> but the query plan doesn't seem to take this into account.
> 
> We think that only having six values of Status means that the speed up
> from the index is not going to be fantastic but every little helps.
> 
> We have run analyze on the database and that hasn't helped.
> 
> Our gut feeling at this moment is that we have the design structure
> wrong in our DB and we're going to have to take a long hard look at what
> we're doing, do a redesign and a rebuild as we simply got it wrong from
> the beginning. Hindsight is wonderful ;) In the interim (as this is a
> 2-3 month job), we need to speed our query up from 90 mins down to
> something in the tens of mins.
> 
> Any suggestions very much welcomed,
> 
> Thanks
> 
> Rob
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated 
index that doesn't use COLLATE. Thats 32mins to 16mins.


2. We then shaved a further six minutes off the execution time by 
removing extraneous fields in the select statement, so instead of 
"select * ...", we identified which fields we used and directly selected 
those. So we are now down to 10 mins or 1/3 of when we started for, to 
be honest, virtually no extra work, merely being smarter, or rather you 
being smarter.


3. We have looked through all our indexes and can see that every index 
has a COLLATE against it, even if the column is an integer. We have 
raised a support call with Navicat.


4. The next step is to create a "covering index" to try and get the 
whole of the query into the index. However its 22:11 in London and I 
need to get home.


Thanks very much for the help so far. Tomorrow is more tricky but I'll 
read up on covering indexes to see how to use them,.


Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

On 17 Mar 2017, at 6:22pm, Rob Willett  
wrote:


4. Work through returning just the columns we actually need from our 
queries. We have a recollection that if we can build an index with 
all the information necessary in it, we can do all the work in joins 
rather than paging out to disk. Is this what you are referring to?


It works only where all the columns you need to read are in the same 
table.  The ideal form of a covering index is to have the columns 
listed in this order:


1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is 
available from the index, so it doesn’t bother to read the table at 
all.  This can lead to something like a doubling of speed.  Of course, 
you sacrifice filespace, and making changes to the table takes a 
little longer.


5. Sleep (not exactly sure when) and watch three international rugby 
games tomorrow.


Sleep while waiting for indexes to be created and ANALYZE to work.  
May you see skilled players, creative moves and dramatic play.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread David Raymond
The data for each record is stored in the same order as the fields are defined. 
When reading a record's information, SQLite starts at the first field, and 
stops at the last field needed to complete the query. The situation where this 
makes the most difference is where there's a reasonably sized blob field, those 
should always be the final fields in the table's schema. If you have an 
important field after the large blob, SQLite has to load through the whole blob 
to get to that field, including going through any needed overflow pages (which 
are a linked list). If your query doesn't need that extra information, then 
SQLite can stop and not bother reading all that extra data. Since SQLite only 
reads whole pages at a time from disk though, that's mostly only relevant when 
you've got a lot of fields or large ones which result in overflow pages being 
needed.

The not loading more fields than needed is in part why you see a lot of sub 
queries that have "select 1 from", usually along the lines of
"...where exists(select 1 from otherTable where...)..."
By using the constant of 1 you avoid artificially making SQLite grab any more 
fields than what it needs for the where clause.


Of course, I have been known to be wrong on these things.

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Simon Slavin
Sent: Friday, March 17, 2017 2:40 PM
To: SQLite mailing list
Subject: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...


On 17 Mar 2017, at 6:22pm, Rob Willett <rob.sql...@robertwillett.com> wrote:

> 4. Work through returning just the columns we actually need from our queries. 
> We have a recollection that if we can build an index with all the information 
> necessary in it, we can do all the work in joins rather than paging out to 
> disk. Is this what you are referring to?

It works only where all the columns you need to read are in the same table.  
The ideal form of a covering index is to have the columns listed in this order:

1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is available 
from the index, so it doesn’t bother to read the table at all.  This can lead 
to something like a doubling of speed.  Of course, you sacrifice filespace, and 
making changes to the table takes a little longer.

> 5. Sleep (not exactly sure when) and watch three international rugby games 
> tomorrow.

Sleep while waiting for indexes to be created and ANALYZE to work.  May you see 
skilled players, creative moves and dramatic play.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Hick Gunter
Ad 4) not quite,but close. If the index used for a join also contains all the 
other fields of that table that are referenced in the query, SQLite can use 
those values to avoid reading in the corresponding table row. This saves memory 
(no storage for table row consumed), CPU cycles (no going through another BTree 
structure) and disk IO (no access to the tables' pages).

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Rob Willett
Gesendet: Freitag, 17. März 2017 19:22
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Simon,

We're still benchmarking the various changes we've made during the day.
I'm sitting here watching a tail of a log file waiting for it to finish.
Ah the joys of the internet startup :)

Our plan of action is

1. Have baseline from which we can work from. We now have that.

2. Create the correct sub-query index and work out the (hopefully) increase in 
speed.

3. Work through all the remaining indexes and check that we have not made the 
same mistake. I know we actually have :( Keep rerunning our benchmark so we 
know if we are actually making a difference.

4. Work through returning just the columns we actually need from our queries. 
We have a recollection that if we can build an index with all the information 
necessary in it, we can do all the work in joins rather than paging out to 
disk. Is this what you are referring to?

5. Sleep (not exactly sure when) and watch three international rugby games 
tomorrow.

Rob

On 17 Mar 2017, at 18:15, Simon Slavin wrote:

> On 17 Mar 2017, at 5:30pm, Rob Willett <rob.sql...@robertwillett.com>
> wrote:
>
>> echo "select * from Disruptions where status = 2 OR status = 6;" |
>> sqlite3 tfl.sqlite > /dev/null
>>
>> twice and each run as 12 mins. So we were twice as quick, which is
>> nice.
>
> Do you actually need all columns ?  If not, then specifying the
> columns you need can lead to a further speedup.  It might be enough
> just to specify the columns you need, but you can achieve further
> increases in speed by making a covering index.  If speed for this
> SELECT is sufficiently important to you, and you don’t actually need
> all columns, post again and we’ll explain further.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Simon,

We're still benchmarking the various changes we've made during the day. 
I'm sitting here watching a tail of a log file waiting for it to finish. 
Ah the joys of the internet startup :)


Our plan of action is

1. Have baseline from which we can work from. We now have that.

2. Create the correct sub-query index and work out the (hopefully) 
increase in speed.


3. Work through all the remaining indexes and check that we have not 
made the same mistake. I know we actually have :( Keep rerunning our 
benchmark so we know if we are actually making a difference.


4. Work through returning just the columns we actually need from our 
queries. We have a recollection that if we can build an index with all 
the information necessary in it, we can do all the work in joins rather 
than paging out to disk. Is this what you are referring to?


5. Sleep (not exactly sure when) and watch three international rugby 
games tomorrow.


Rob

On 17 Mar 2017, at 18:15, Simon Slavin wrote:

On 17 Mar 2017, at 5:30pm, Rob Willett  
wrote:


echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null


twice and each run as 12 mins. So we were twice as quick, which is 
nice.


Do you actually need all columns ?  If not, then specifying the 
columns you need can lead to a further speedup.  It might be enough 
just to specify the columns you need, but you can achieve further 
increases in speed by making a covering index.  If speed for this 
SELECT is sufficiently important to you, and you don’t actually need 
all columns, post again and we’ll explain further.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Simon Slavin

On 17 Mar 2017, at 5:30pm, Rob Willett  wrote:

> echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 
> tfl.sqlite > /dev/null
> 
> twice and each run as 12 mins. So we were twice as quick, which is nice.

Do you actually need all columns ?  If not, then specifying the columns you 
need can lead to a further speedup.  It might be enough just to specify the 
columns you need, but you can achieve further increases in speed by making a 
covering index.  If speed for this SELECT is sufficiently important to you, and 
you don’t actually need all columns, post again and we’ll explain further.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Thanks to everybody for their help earlier today.

As promised here's the results of our various tests. Hopefully they may 
be of use to somebody...


We decided to start from a known position and so recreated the original 
index with the collation in it. We know this was sub optimal but its our 
reference point. We have the bytecode output if anybody wants to see it.


CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE 
ASC);


We ran the the following SQL twice

echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null


and the two runs totalled 46 mins. Each was actually 23 mins.

We then dropped the old index, built the new one

echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | 
sqlite3 tfl.sqlite


We ran

echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null


twice and each run as 12 mins. So we were twice as quick, which is nice.

We then ran

echo "explain select * from Disruptions where status = 2 UNION ALL 
select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite


twice. Each run was around 11.5 mins. We're not going to get into 
differences of less than a minute on a run of this size, so we'll say 
they are about the same speed.


Interesting results, clearly the collation does make a big difference. 
We are now going to go through the schema and check if we have made the 
same mistake elsewhere.


Thanks for your help, we can post the bytecode it people are interested.

Rob

On 17 Mar 2017, at 11:41, Rob Willett wrote:


Gunter,

I would never presume to describe anybody as a Nerd!

We're just going back to very first position with the 'bad' collation 
index so we can do proper timings as we change things so we understand 
the speed up (we hope there is a speed up)


We've written a quick script to check each version. Once we've put the 
original index back in, we've added a step to generate the SQLite 
bytecode for you. It's the least we can do...


We'll post this when its completed but we suspect it may take most of 
the day now :)


echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions 
("status" COLLATE NOCASE ASC);'"
echo "explain select * from Disruptions where status = 2 OR status = 
6;" | sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | 
sqlite3 tfl.sqlite
echo "explain select * from Disruptions where status = 2 OR status = 
6;" | sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status = 
6;" | sqlite3 tfl.sqlite

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:

Nerds with chronic byte code affinity like myself would like to see 
the output of "explain" (without "query plan"), i.e. the SQLite 
bytecode produced. I guess the query with OR will have a subprogram 
called once for each status value, whereas I expect the query with 
UNION ALL to have 2 copies of the search (which would not affect the 
run time) and maybe even a temporary table of results (which would 
take longer and use more memory).


-Ursprüngliche Nachricht-
Von: sqlite-users 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von 
Rob Willett

Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index 
question ...


Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the 
collation sequence as a possible issue. We now have a new index and 
we have just run the query again


sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * 

Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Gunter,

I would never presume to describe anybody as a Nerd!

We're just going back to very first position with the 'bad' collation 
index so we can do proper timings as we change things so we understand 
the speed up (we hope there is a speed up)


We've written a quick script to check each version. Once we've put the 
original index back in, we've added a step to generate the SQLite 
bytecode for you. It's the least we can do...


We'll post this when its completed but we suspect it may take most of 
the day now :)


echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions 
("status" COLLATE NOCASE ASC);'"
echo "explain select * from Disruptions where status = 2 OR status = 6;" 
| sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | 
sqlite3 tfl.sqlite
echo "explain select * from Disruptions where status = 2 OR status = 6;" 
| sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status = 6;" 
| sqlite3 tfl.sqlite

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:

Nerds with chronic byte code affinity like myself would like to see 
the output of "explain" (without "query plan"), i.e. the SQLite 
bytecode produced. I guess the query with OR will have a subprogram 
called once for each status value, whereas I expect the query with 
UNION ALL to have 2 copies of the search (which would not affect the 
run time) and maybe even a temporary table of results (which would 
take longer and use more memory).


-Ursprüngliche Nachricht-----
Von: sqlite-users 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von 
Rob Willett

Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index 
question ...


Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the 
collation sequence as a possible issue. We now have a new index and we 
have just run the query again


sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 
OR

status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think 
is good.


If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and 
see.


Your last suggestion of "select * from Disruptions  where status =2 
COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically 
equivalent to "explain query plan select * from Disruptions where 
status = 2 OR status = 6;" now we have removed the collation from the 
index.


sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I 
agree is not all) of the fields. Following this logic through, does 
this mean that it will do more file access bringing the records in 
from the file system?


The collation issue seems to be an artifact of the way Navcat for 
SQLite works. I suspect we need to be more careful about how we use 
the tool.



Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Hick Gunter
Nerds with chronic byte code affinity like myself would like to see the output 
of "explain" (without "query plan"), i.e. the SQLite bytecode produced. I guess 
the query with OR will have a subprogram called once for each status value, 
whereas I expect the query with UNION ALL to have 2 copies of the search (which 
would not affect the run time) and maybe even a temporary table of results 
(which would take longer and use more memory).

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Rob Willett
Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the collation 
sequence as a possible issue. We now have a new index and we have just run the 
query again

sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 OR
status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think is good.

If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and see.

Your last suggestion of "select * from Disruptions  where status =2 COLLATE 
NOCASE or status = 6 COLLATE NOCASE" appears to be logically equivalent to 
"explain query plan select * from Disruptions where status = 2 OR status = 6;" 
now we have removed the collation from the index.

sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I agree is not 
all) of the fields. Following this logic through, does this mean that it will 
do more file access bringing the records in from the file system?

The collation issue seems to be an artifact of the way Navcat for SQLite works. 
I suspect we need to be more careful about how we use the tool.

We'll now time the results of each query and run them twice to see the affect. 
No idea how long this will take but suspect a few hours :) I will post back the 
results as other people may (or may not) find this helpful.

Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:

>> On 17 Mar 2017, at 10:20am, Rob Willett
>> <rob.sql...@robertwillett.com> wrote:
>>
>>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
>>> NOCASE ASC);
>>>
>>> […]
>>>
>>> As part of the larger more complex query, we are executing the query
>>>
>>> ```
>>> select * from Disruptions where status = 2 OR status = 6;
>>
>> The schema for the table says that "status" is INTEGER.
>> You are supplying numbers as arguments.
>> Those two match and should create no problem.
>>
>> But your index has a collation order which is usually used for text.
>> I don’t see that it is obviously wrong, but it does look a little
>> weird.
>>
>> Try creating another index which is just on "status", without the
>> COLLATE clause.
>> Then do another ANALYZE, then try the SELECT again.
>>
>> Simon.
>
> If the index is deemed unsuitable by SQLite due to its collation
> sequence, then I expect it qwould also be ignored in "select ...
> status=1" (without the second ORed value)
>
> If not, then (select ... where status =2 UNION ALL select where status
> = 6) should do the trick
>
> Do you really require all the fields from Disruptions?
>
> And yes, collating integers with NOCASE seems quite strange (there are
> no capital or lowercase numbers unless you are using roman numerals ;)
> ); for text affinity, it should render the comparison operators
> caseblind, just like "like".
>
>
> ___
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1

Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the collation 
sequence as a possible issue. We now have a new index and we have just 
run the query again


sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 OR 
status = 6;

selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think is 
good.


If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2 
UNION ALL select * from Disruptions where status = 6;

selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and 
see.


Your last suggestion of "select * from Disruptions  where status =2 
COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically 
equivalent to "explain query plan select * from Disruptions where status 
= 2 OR status = 6;" now we have removed the collation from the index.


sqlite> explain query plan select * from Disruptions  where status =2 
COLLATE NOCASE or status = 6 COLLATE NOCASE;

selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I agree 
is not all) of the fields. Following this logic through, does this mean 
that it will do more file access bringing the records in from the file 
system?


The collation issue seems to be an artifact of the way Navcat for SQLite 
works. I suspect we need to be more careful about how we use the tool.


We'll now time the results of each query and run them twice to see the 
affect. No idea how long this will take but suspect a few hours :) I 
will post back the results as other people may (or may not) find this 
helpful.


Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:

On 17 Mar 2017, at 10:20am, Rob Willett 
 wrote:


CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE 
NOCASE ASC);


[…]

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;


The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.  
I don’t see that it is obviously wrong, but it does look a little 
weird.


Try creating another index which is just on "status", without the 
COLLATE clause.

Then do another ANALYZE, then try the SELECT again.

Simon.


If the index is deemed unsuitable by SQLite due to its collation 
sequence, then I expect it qwould also be ignored in "select ... 
status=1" (without the second ORed value)


If not, then (select ... where status =2 UNION ALL select where status 
= 6) should do the trick


Do you really require all the fields from Disruptions?

And yes, collating integers with NOCASE seems quite strange (there are 
no capital or lowercase numbers unless you are using roman numerals ;) 
); for text affinity, it should render the comparison operators 
caseblind, just like "like".



___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use 
of the intended recipient(s) only and may contain information that is 
confidential, privileged or legally protected. Any unauthorized use or 
dissemination of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify 
the sender by return e-mail message and delete all copies of the 
original communication. Thank you for your cooperation.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Hick Gunter
>On 17 Mar 2017, at 10:20am, Rob Willett  wrote:
>
>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);
>>
>> […]
>>
>> As part of the larger more complex query, we are executing the query
>>
>> ```
>> select * from Disruptions where status = 2 OR status = 6;
>
>The schema for the table says that "status" is INTEGER.
>You are supplying numbers as arguments.
>Those two match and should create no problem.
>
>But your index has a collation order which is usually used for text.  I don’t 
>see that it is obviously wrong, but it does look a little weird.
>
>Try creating another index which is just on "status", without the COLLATE 
>clause.
>Then do another ANALYZE, then try the SELECT again.
>
>Simon.

If the index is deemed unsuitable by SQLite due to its collation sequence, then 
I expect it qwould also be ignored in "select ... status=1" (without the second 
ORed value)

If not, then (select ... where status =2 UNION ALL select where status = 6) 
should do the trick

Do you really require all the fields from Disruptions?

And yes, collating integers with NOCASE seems quite strange (there are no 
capital or lowercase numbers unless you are using roman numerals ;) ); for text 
affinity, it should render the comparison operators caseblind, just like "like".


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Simon,

Thank you for the quick response. We'll do this. We're just waiting for 
another analyze to finish. Our current DB is 45GB, the query works OK on 
smaller databases, but at this size its very slow, thats why we have a 
bad feeling that we're tinkering on the edges and we need a new DB 
design :(


The reason for the collation is the way NavCat SQLite works, its 
difficult to get it without the collate but we'll do this direct from 
the command line.


We'll post back in an hour or so as it'll probably take that long to run 
:)


Rob

On 17 Mar 2017, at 10:27, Simon Slavin wrote:

On 17 Mar 2017, at 10:20am, Rob Willett  
wrote:


CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE 
NOCASE ASC);


[…]

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;


The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.  
I don’t see that it is obviously wrong, but it does look a little 
weird.


Try creating another index which is just on "status", without the 
COLLATE clause.

Then do another ANALYZE, then try the SELECT again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Simon Slavin

On 17 Mar 2017, at 10:20am, Rob Willett  wrote:

> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);
> 
> […]
> 
> As part of the larger more complex query, we are executing the query
> 
> ```
> select * from Disruptions where status = 2 OR status = 6;

The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.  I don’t 
see that it is obviously wrong, but it does look a little weird.

Try creating another index which is just on "status", without the COLLATE 
clause.
Then do another ANALYZE, then try the SELECT again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users