Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Just to close this thread down completely. We've implemented the changes on our production server and the actual performance increase is even better than we predicted. Our production server is a containerised Ubuntu Server. Its hardware configuration is significantly different to our Macs on which we develop. Our Macs have more RAM and SSD disks. The production server has other advantages though :) The query took 90 mins on our production server, this is mainly due to slower disks than we have locally, so this was not unexpected. After doing the performance changes (removing unneeded fields and adding in a covering index), we went down to 38 secs. This has meant we are no longer looking at a second database just for analytics, we just needed to learn to use the database we already had :) Rob On 18 Mar 2017, at 5:48, Rob Willett wrote: We've just implemented a covering index for the last step (is it really?) in our quest to get the query execution time down. To summarise we have gone from 32 mins to 16 mins by updating an index so it doesn't use collate, we took another six minutes off by removing extra fields in the select we didn't need. We have just created a new index which 'covers' all the fields we use in the select, this means (and I paraphrase) that we use the index to get all the data and there is no need to read from the database. Well that was a bit of a surprise, the index creation took 45 mins, we ran the program again and thought, rats, we've cocked it up, it only took 54 secs, we got something wrong. So we checked it and checked again and we hasn't got anything wrong. Our query has moved from 32 mins to 54 secs. We're quite happy with that performance increase. In fact we're delighted, so thanks for all the help in getting us to this stage. We have kept copies of the query planner bytecode output if anybody is interested. Gunter has had copies, but if anybody else would like them, please ask. Many thanks again for all the help, Rob On 17 Mar 2017, at 22:12, Rob Willett wrote: Dear all, We progress steadily forward. 1. We immediately halved our execution time by moving to an updated index that doesn't use COLLATE. Thats 32mins to 16mins. 2. We then shaved a further six minutes off the execution time by removing extraneous fields in the select statement, so instead of "select * ...", we identified which fields we used and directly selected those. So we are now down to 10 mins or 1/3 of when we started for, to be honest, virtually no extra work, merely being smarter, or rather you being smarter. 3. We have looked through all our indexes and can see that every index has a COLLATE against it, even if the column is an integer. We have raised a support call with Navicat. 4. The next step is to create a "covering index" to try and get the whole of the query into the index. However its 22:11 in London and I need to get home. Thanks very much for the help so far. Tomorrow is more tricky but I'll read up on covering indexes to see how to use them,. Rob On 17 Mar 2017, at 18:39, Simon Slavin wrote: On 17 Mar 2017, at 6:22pm, Rob Willett wrote: 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to? It works only where all the columns you need to read are in the same table. The ideal form of a covering index is to have the columns listed in this order: 1) columns needed for the WHERE clause 2) columns needed for the ORDER BY clause which aren’t in (1) 3) columns needed to be read which aren’t in (2) or (1) SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all. This can lead to something like a doubling of speed. Of course, you sacrifice filespace, and making changes to the table takes a little longer. 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow. Sleep while waiting for indexes to be created and ANALYZE to work. May you see skilled players, creative moves and dramatic play. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
David, We're English and masters of the understatement ;) Rob On 20 Mar 2017, at 15:04, David Raymond wrote: "... Our query has moved from 32 mins to 54 secs. We're quite happy with that performance increase." I have to admit that the hearty chuckle which that statement produced from me hurt my sore throat. Totally worth it though. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
"... Our query has moved from 32 mins to 54 secs. We're quite happy with that performance increase." I have to admit that the hearty chuckle which that statement produced from me hurt my sore throat. Totally worth it though. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Oops - somehow misread the last message - 54 seconds down from 32 minutes - that's a result! Paul www.sandersonforensics.com skype: r3scue193 twitter: @sandersonforens Tel +44 (0)1326 572786 http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit -Forensic Toolkit for SQLite email from a work address for a fully functional demo licence On 19 March 2017 at 12:07, Paul Sanderson wrote: > What is the average size of the text in the direction field? and what page > size have you set for the database? If the size of a record is such that > only a small handful fit into a page, or worse each record overflows (and > your select includes the direction field) then this could impact > performance. > > Paul > www.sandersonforensics.com > skype: r3scue193 > twitter: @sandersonforens > Tel +44 (0)1326 572786 <+44%201326%20572786> > http://sandersonforensics.com/forum/content.php?195-SQLite- > Forensic-Toolkit -Forensic Toolkit for SQLite > email from a work address for a fully functional demo licence > > On 18 March 2017 at 05:48, Rob Willett > wrote: > >> We've just implemented a covering index for the last step (is it really?) >> in our quest to get the query execution time down. >> >> To summarise we have gone from 32 mins to 16 mins by updating an index so >> it doesn't use collate, we took another six minutes off by removing extra >> fields in the select we didn't need. >> >> We have just created a new index which 'covers' all the fields we use in >> the select, this means (and I paraphrase) that we use the index to get all >> the data and there is no need to read from the database. >> >> Well that was a bit of a surprise, the index creation took 45 mins, we >> ran the program again and thought, rats, we've cocked it up, it only took >> 54 secs, we got something wrong. So we checked it and checked again and we >> hasn't got anything wrong. Our query has moved from 32 mins to 54 secs. >> >> We're quite happy with that performance increase. In fact we're >> delighted, so thanks for all the help in getting us to this stage. >> >> We have kept copies of the query planner bytecode output if anybody is >> interested. Gunter has had copies, but if anybody else would like them, >> please ask. >> >> Many thanks again for all the help, >> >> Rob >> >> >> On 17 Mar 2017, at 22:12, Rob Willett wrote: >> >> Dear all, >>> >>> We progress steadily forward. >>> >>> 1. We immediately halved our execution time by moving to an updated >>> index that doesn't use COLLATE. Thats 32mins to 16mins. >>> >>> 2. We then shaved a further six minutes off the execution time by >>> removing extraneous fields in the select statement, so instead of "select * >>> ...", we identified which fields we used and directly selected those. So we >>> are now down to 10 mins or 1/3 of when we started for, to be honest, >>> virtually no extra work, merely being smarter, or rather you being smarter. >>> >>> 3. We have looked through all our indexes and can see that every index >>> has a COLLATE against it, even if the column is an integer. We have raised >>> a support call with Navicat. >>> >>> 4. The next step is to create a "covering index" to try and get the >>> whole of the query into the index. However its 22:11 in London and I need >>> to get home. >>> >>> Thanks very much for the help so far. Tomorrow is more tricky but I'll >>> read up on covering indexes to see how to use them,. >>> >>> Rob >>> >>> On 17 Mar 2017, at 18:39, Simon Slavin wrote: >>> >>> On 17 Mar 2017, at 6:22pm, Rob Willett wrote: 4. Work through returning just the columns we actually need from our > queries. We have a recollection that if we can build an index with all the > information necessary in it, we can do all the work in joins rather than > paging out to disk. Is this what you are referring to? > It works only where all the columns you need to read are in the same table. The ideal form of a covering index is to have the columns listed in this order: 1) columns needed for the WHERE clause 2) columns needed for the ORDER BY clause which aren’t in (1) 3) columns needed to be read which aren’t in (2) or (1) SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all. This can lead to something like a doubling of speed. Of course, you sacrifice filespace, and making changes to the table takes a little longer. 5. Sleep (not exactly sure when) and watch three international rugby > games tomorrow. > Sleep while waiting for indexes to be created and ANALYZE to work. May you see skilled players, creative moves and dramatic play. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >>> __
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
What is the average size of the text in the direction field? and what page size have you set for the database? If the size of a record is such that only a small handful fit into a page, or worse each record overflows (and your select includes the direction field) then this could impact performance. Paul www.sandersonforensics.com skype: r3scue193 twitter: @sandersonforens Tel +44 (0)1326 572786 http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit -Forensic Toolkit for SQLite email from a work address for a fully functional demo licence On 18 March 2017 at 05:48, Rob Willett wrote: > We've just implemented a covering index for the last step (is it really?) > in our quest to get the query execution time down. > > To summarise we have gone from 32 mins to 16 mins by updating an index so > it doesn't use collate, we took another six minutes off by removing extra > fields in the select we didn't need. > > We have just created a new index which 'covers' all the fields we use in > the select, this means (and I paraphrase) that we use the index to get all > the data and there is no need to read from the database. > > Well that was a bit of a surprise, the index creation took 45 mins, we ran > the program again and thought, rats, we've cocked it up, it only took 54 > secs, we got something wrong. So we checked it and checked again and we > hasn't got anything wrong. Our query has moved from 32 mins to 54 secs. > > We're quite happy with that performance increase. In fact we're delighted, > so thanks for all the help in getting us to this stage. > > We have kept copies of the query planner bytecode output if anybody is > interested. Gunter has had copies, but if anybody else would like them, > please ask. > > Many thanks again for all the help, > > Rob > > > On 17 Mar 2017, at 22:12, Rob Willett wrote: > > Dear all, >> >> We progress steadily forward. >> >> 1. We immediately halved our execution time by moving to an updated index >> that doesn't use COLLATE. Thats 32mins to 16mins. >> >> 2. We then shaved a further six minutes off the execution time by >> removing extraneous fields in the select statement, so instead of "select * >> ...", we identified which fields we used and directly selected those. So we >> are now down to 10 mins or 1/3 of when we started for, to be honest, >> virtually no extra work, merely being smarter, or rather you being smarter. >> >> 3. We have looked through all our indexes and can see that every index >> has a COLLATE against it, even if the column is an integer. We have raised >> a support call with Navicat. >> >> 4. The next step is to create a "covering index" to try and get the whole >> of the query into the index. However its 22:11 in London and I need to get >> home. >> >> Thanks very much for the help so far. Tomorrow is more tricky but I'll >> read up on covering indexes to see how to use them,. >> >> Rob >> >> On 17 Mar 2017, at 18:39, Simon Slavin wrote: >> >> On 17 Mar 2017, at 6:22pm, Rob Willett >>> wrote: >>> >>> 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to? >>> >>> It works only where all the columns you need to read are in the same >>> table. The ideal form of a covering index is to have the columns listed in >>> this order: >>> >>> 1) columns needed for the WHERE clause >>> 2) columns needed for the ORDER BY clause which aren’t in (1) >>> 3) columns needed to be read which aren’t in (2) or (1) >>> >>> SQLite detects that all the information it needs for the SELECT is >>> available from the index, so it doesn’t bother to read the table at all. >>> This can lead to something like a doubling of speed. Of course, you >>> sacrifice filespace, and making changes to the table takes a little longer. >>> >>> 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow. >>> >>> Sleep while waiting for indexes to be created and ANALYZE to work. May >>> you see skilled players, creative moves and dramatic play. >>> >>> Simon. >>> ___ >>> sqlite-users mailing list >>> sqlite-users@mailinglists.sqlite.org >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >>> >> ___ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
We've just implemented a covering index for the last step (is it really?) in our quest to get the query execution time down. To summarise we have gone from 32 mins to 16 mins by updating an index so it doesn't use collate, we took another six minutes off by removing extra fields in the select we didn't need. We have just created a new index which 'covers' all the fields we use in the select, this means (and I paraphrase) that we use the index to get all the data and there is no need to read from the database. Well that was a bit of a surprise, the index creation took 45 mins, we ran the program again and thought, rats, we've cocked it up, it only took 54 secs, we got something wrong. So we checked it and checked again and we hasn't got anything wrong. Our query has moved from 32 mins to 54 secs. We're quite happy with that performance increase. In fact we're delighted, so thanks for all the help in getting us to this stage. We have kept copies of the query planner bytecode output if anybody is interested. Gunter has had copies, but if anybody else would like them, please ask. Many thanks again for all the help, Rob On 17 Mar 2017, at 22:12, Rob Willett wrote: Dear all, We progress steadily forward. 1. We immediately halved our execution time by moving to an updated index that doesn't use COLLATE. Thats 32mins to 16mins. 2. We then shaved a further six minutes off the execution time by removing extraneous fields in the select statement, so instead of "select * ...", we identified which fields we used and directly selected those. So we are now down to 10 mins or 1/3 of when we started for, to be honest, virtually no extra work, merely being smarter, or rather you being smarter. 3. We have looked through all our indexes and can see that every index has a COLLATE against it, even if the column is an integer. We have raised a support call with Navicat. 4. The next step is to create a "covering index" to try and get the whole of the query into the index. However its 22:11 in London and I need to get home. Thanks very much for the help so far. Tomorrow is more tricky but I'll read up on covering indexes to see how to use them,. Rob On 17 Mar 2017, at 18:39, Simon Slavin wrote: On 17 Mar 2017, at 6:22pm, Rob Willett wrote: 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to? It works only where all the columns you need to read are in the same table. The ideal form of a covering index is to have the columns listed in this order: 1) columns needed for the WHERE clause 2) columns needed for the ORDER BY clause which aren’t in (1) 3) columns needed to be read which aren’t in (2) or (1) SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all. This can lead to something like a doubling of speed. Of course, you sacrifice filespace, and making changes to the table takes a little longer. 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow. Sleep while waiting for indexes to be created and ANALYZE to work. May you see skilled players, creative moves and dramatic play. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
On 17 Mar 2017, at 10:12pm, Rob Willett wrote: > 3. We have looked through all our indexes and can see that every index has a > COLLATE against it, even if the column is an integer. We have raised a > support call with Navicat. This might also be something that Navicat should look into. The most efficient way to implement COLLATE in SQLite is to do it in the column definition. At the moment, for example, the table is defined as > CREATE TABLE "Disruptions" ( >"id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, >[…] >"Direction" TEXT > ); . If instead it said CREATE TABLE "Disruptions" ( "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, […] "Direction" TEXT COLLATE NOCASE ); then all comparisons and indexes which mentioned "Direction" would automatically use NOCASE without having to have COLLATE NOCASE mentioned in them. This is almost always The Right Thing, and what the programmer would want. It simplifies all the other SQL commands used by the program. And since the collation on the index then matches the collation for the column definition, SQLite has to do less work every time it deals with the index. Unfortunately, this change cannot be made with backward compatibility. Changing the table definition would involve remaking the table, probably by defining a new table with the new definition, copying the data across, deleting the original table, and renaming the new one. If backward compatibility is important in support of the program then this may be a deal-breaker and one could understand why the developer team won’t make the change. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
On Friday, 17 March, 2017 16:13, Rob Willett said: > 3. We have looked through all our indexes and can see that every index > has a COLLATE against it, even if the column is an integer. We have > raised a support call with Navicat. If you want a table field to be non-case-sensitive, the proper place to declare that field as such is when you define the table. create table Words (Word text unique collate nocase); means that everywhere that you use Wowrds.Word is not case sensitive (but it is case preserving). End of Story. create table Words (Word text); create unique index sqlite_autoindex_Words_1 on Words (Word collate nocase); is however an entirely different thing. The data in Words.Word is case sensitive, and the index is not case sensitive. Thus query will operate differently depending on how you declared your table and index: select * from Words where word = 'Apple'; if the former (nocase collation on the table) will (a) use the index and (b) be equivalent to where lower(word) = lower('Apple') (that is, case insensitive) in the case where you have declared a collate nocase index, the index will NOT BE USED unless you specify COLLATE NOCASE in the where clause. By default the search will be a table scan that is case sensitive. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Change your query to explain query plan select * from Disruptions where status = 2 collate nocase OR status = 6 collate nocase; to match your index. > -Original Message- > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] > On Behalf Of Rob Willett > Sent: Friday, 17 March, 2017 04:20 > To: SQLite mailing list > Subject: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ... > > Hi, > > We've been struggling with a complex query that we have written. One of > the elements of this complex query is a select statement that doesn't > appear to use an index when we *think* it should do. We're not going to > ask you to debug a large complex SQL query (unless you have nothing else > to do today) but we're working our way through the query trying to > understand where we've gone wrong. So we've broken down the query and > are trying each section to see what it performs (or rather doesn't > perform like). > > The sub query is a simple select on a large table, Disruptions. The > Disruptions table has 180M rows of data. > > The schema for it is here. We've pulled it straight from Navicat for > SQLite. > > ``` > CREATE TABLE "Disruptions" ( >"id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, >"version" integer NOT NULL, >"Disruption_id" INTEGER NOT NULL, >"status" integer NOT NULL, >"severity" integer NOT NULL, >"levelOfInterest" integer NOT NULL, >"category" integer NOT NULL, >"subCategory" integer NOT NULL, >"startTime" TEXT NOT NULL, >"endTime" text NOT NULL, >"location" integer NOT NULL, >"corridor" integer NOT NULL, >"comments" integer NOT NULL, >"currentUpdate" integer NOT NULL, >"remarkTime" TEXT NOT NULL, >"lastModTime" TEXT NOT NULL, >"CauseAreaPointX" real NOT NULL, >"CauseAreaPointY" real NOT NULL, >"Direction" TEXT > ); > INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions", > '184626834'); > > -- > -- Indexes structure for table Disruptions > -- > CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE > NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC, > "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, > "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, > "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC); > CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" > COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE > NOCASE ASC, "subCategory" COLLATE NOCASE ASC); > CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE > NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE > NOCASE ASC); > CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE > ASC); > > PRAGMA foreign_keys = true; > ``` > > As part of the larger more complex query, we are executing the query > > ``` > select * from Disruptions where status = 2 OR status = 6; > ``` > > Status is one of six values, 1 to 6 and is probably not evenly > distributed across the 180M rows. > > If we do > > ``` > sqlite> explain query plan select * from Disruptions where status = 2 OR > status = 6; > selectid|order|from|detail > 0|0|0|SCAN TABLE Disruptions > ``` > > We can see that table scanning a 180M records is going to be slow, no > matter what the rest of the complex query is like. > > We have an index Disruptions_idx4 which we *think* should speed it up, > but the query plan doesn't seem to take this into account. > > We think that only having six values of Status means that the speed up > from the index is not going to be fantastic but every little helps. > > We have run analyze on the database and that hasn't helped. > > Our gut feeling at this moment is that we have the design structure > wrong in our DB and we're going to have to take a long hard look at what > we're doing, do a redesign and a rebuild as we simply got it wrong from > the beginning. Hindsight is wonderful ;) In the interim (as this is a > 2-3 month job), we need to speed our query up from 90 mins down to > something in the tens of mins. > > Any suggestions very much welcomed, > > Thanks > > Rob > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Dear all, We progress steadily forward. 1. We immediately halved our execution time by moving to an updated index that doesn't use COLLATE. Thats 32mins to 16mins. 2. We then shaved a further six minutes off the execution time by removing extraneous fields in the select statement, so instead of "select * ...", we identified which fields we used and directly selected those. So we are now down to 10 mins or 1/3 of when we started for, to be honest, virtually no extra work, merely being smarter, or rather you being smarter. 3. We have looked through all our indexes and can see that every index has a COLLATE against it, even if the column is an integer. We have raised a support call with Navicat. 4. The next step is to create a "covering index" to try and get the whole of the query into the index. However its 22:11 in London and I need to get home. Thanks very much for the help so far. Tomorrow is more tricky but I'll read up on covering indexes to see how to use them,. Rob On 17 Mar 2017, at 18:39, Simon Slavin wrote: On 17 Mar 2017, at 6:22pm, Rob Willett wrote: 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to? It works only where all the columns you need to read are in the same table. The ideal form of a covering index is to have the columns listed in this order: 1) columns needed for the WHERE clause 2) columns needed for the ORDER BY clause which aren’t in (1) 3) columns needed to be read which aren’t in (2) or (1) SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all. This can lead to something like a doubling of speed. Of course, you sacrifice filespace, and making changes to the table takes a little longer. 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow. Sleep while waiting for indexes to be created and ANALYZE to work. May you see skilled players, creative moves and dramatic play. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
The data for each record is stored in the same order as the fields are defined. When reading a record's information, SQLite starts at the first field, and stops at the last field needed to complete the query. The situation where this makes the most difference is where there's a reasonably sized blob field, those should always be the final fields in the table's schema. If you have an important field after the large blob, SQLite has to load through the whole blob to get to that field, including going through any needed overflow pages (which are a linked list). If your query doesn't need that extra information, then SQLite can stop and not bother reading all that extra data. Since SQLite only reads whole pages at a time from disk though, that's mostly only relevant when you've got a lot of fields or large ones which result in overflow pages being needed. The not loading more fields than needed is in part why you see a lot of sub queries that have "select 1 from", usually along the lines of "...where exists(select 1 from otherTable where...)..." By using the constant of 1 you avoid artificially making SQLite grab any more fields than what it needs for the where clause. Of course, I have been known to be wrong on these things. -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin Sent: Friday, March 17, 2017 2:40 PM To: SQLite mailing list Subject: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ... On 17 Mar 2017, at 6:22pm, Rob Willett wrote: > 4. Work through returning just the columns we actually need from our queries. > We have a recollection that if we can build an index with all the information > necessary in it, we can do all the work in joins rather than paging out to > disk. Is this what you are referring to? It works only where all the columns you need to read are in the same table. The ideal form of a covering index is to have the columns listed in this order: 1) columns needed for the WHERE clause 2) columns needed for the ORDER BY clause which aren’t in (1) 3) columns needed to be read which aren’t in (2) or (1) SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all. This can lead to something like a doubling of speed. Of course, you sacrifice filespace, and making changes to the table takes a little longer. > 5. Sleep (not exactly sure when) and watch three international rugby games > tomorrow. Sleep while waiting for indexes to be created and ANALYZE to work. May you see skilled players, creative moves and dramatic play. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
On 17 Mar 2017, at 6:22pm, Rob Willett wrote: > 4. Work through returning just the columns we actually need from our queries. > We have a recollection that if we can build an index with all the information > necessary in it, we can do all the work in joins rather than paging out to > disk. Is this what you are referring to? It works only where all the columns you need to read are in the same table. The ideal form of a covering index is to have the columns listed in this order: 1) columns needed for the WHERE clause 2) columns needed for the ORDER BY clause which aren’t in (1) 3) columns needed to be read which aren’t in (2) or (1) SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all. This can lead to something like a doubling of speed. Of course, you sacrifice filespace, and making changes to the table takes a little longer. > 5. Sleep (not exactly sure when) and watch three international rugby games > tomorrow. Sleep while waiting for indexes to be created and ANALYZE to work. May you see skilled players, creative moves and dramatic play. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Ad 4) not quite,but close. If the index used for a join also contains all the other fields of that table that are referenced in the query, SQLite can use those values to avoid reading in the corresponding table row. This saves memory (no storage for table row consumed), CPU cycles (no going through another BTree structure) and disk IO (no access to the tables' pages). -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Rob Willett Gesendet: Freitag, 17. März 2017 19:22 An: SQLite mailing list Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ... Simon, We're still benchmarking the various changes we've made during the day. I'm sitting here watching a tail of a log file waiting for it to finish. Ah the joys of the internet startup :) Our plan of action is 1. Have baseline from which we can work from. We now have that. 2. Create the correct sub-query index and work out the (hopefully) increase in speed. 3. Work through all the remaining indexes and check that we have not made the same mistake. I know we actually have :( Keep rerunning our benchmark so we know if we are actually making a difference. 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to? 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow. Rob On 17 Mar 2017, at 18:15, Simon Slavin wrote: > On 17 Mar 2017, at 5:30pm, Rob Willett > wrote: > >> echo "select * from Disruptions where status = 2 OR status = 6;" | >> sqlite3 tfl.sqlite > /dev/null >> >> twice and each run as 12 mins. So we were twice as quick, which is >> nice. > > Do you actually need all columns ? If not, then specifying the > columns you need can lead to a further speedup. It might be enough > just to specify the columns you need, but you can achieve further > increases in speed by making a covering index. If speed for this > SELECT is sufficiently important to you, and you don’t actually need > all columns, post again and we’ll explain further. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Simon, We're still benchmarking the various changes we've made during the day. I'm sitting here watching a tail of a log file waiting for it to finish. Ah the joys of the internet startup :) Our plan of action is 1. Have baseline from which we can work from. We now have that. 2. Create the correct sub-query index and work out the (hopefully) increase in speed. 3. Work through all the remaining indexes and check that we have not made the same mistake. I know we actually have :( Keep rerunning our benchmark so we know if we are actually making a difference. 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to? 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow. Rob On 17 Mar 2017, at 18:15, Simon Slavin wrote: On 17 Mar 2017, at 5:30pm, Rob Willett wrote: echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null twice and each run as 12 mins. So we were twice as quick, which is nice. Do you actually need all columns ? If not, then specifying the columns you need can lead to a further speedup. It might be enough just to specify the columns you need, but you can achieve further increases in speed by making a covering index. If speed for this SELECT is sufficiently important to you, and you don’t actually need all columns, post again and we’ll explain further. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
On 17 Mar 2017, at 5:30pm, Rob Willett wrote: > echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 > tfl.sqlite > /dev/null > > twice and each run as 12 mins. So we were twice as quick, which is nice. Do you actually need all columns ? If not, then specifying the columns you need can lead to a further speedup. It might be enough just to specify the columns you need, but you can achieve further increases in speed by making a covering index. If speed for this SELECT is sufficiently important to you, and you don’t actually need all columns, post again and we’ll explain further. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Thanks to everybody for their help earlier today. As promised here's the results of our various tests. Hopefully they may be of use to somebody... We decided to start from a known position and so recreated the original index with the collation in it. We know this was sub optimal but its our reference point. We have the bytecode output if anybody wants to see it. CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); We ran the the following SQL twice echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null and the two runs totalled 46 mins. Each was actually 23 mins. We then dropped the old index, built the new one echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | sqlite3 tfl.sqlite We ran echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null twice and each run as 12 mins. So we were twice as quick, which is nice. We then ran echo "explain select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite twice. Each run was around 11.5 mins. We're not going to get into differences of less than a minute on a run of this size, so we'll say they are about the same speed. Interesting results, clearly the collation does make a big difference. We are now going to go through the schema and check if we have made the same mistake elsewhere. Thanks for your help, we can post the bytecode it people are interested. Rob On 17 Mar 2017, at 11:41, Rob Willett wrote: Gunter, I would never presume to describe anybody as a Nerd! We're just going back to very first position with the 'bad' collation index so we can do proper timings as we change things so we understand the speed up (we hope there is a speed up) We've written a quick script to check each version. Once we've put the original index back in, we've added a step to generate the SQLite bytecode for you. It's the least we can do... We'll post this when its completed but we suspect it may take most of the day now :) echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);'" echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "---" echo "Creating new index without collation" echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | sqlite3 tfl.sqlite echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "---" echo "Trying SELECT statement with UNION ALL" echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite date echo "select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null date On 17 Mar 2017, at 11:30, Hick Gunter wrote: Nerds with chronic byte code affinity like myself would like to see the output of "explain" (without "query plan"), i.e. the SQLite bytecode produced. I guess the query with OR will have a subprogram called once for each status value, whereas I expect the query with UNION ALL to have 2 copies of the search (which would not affect the run time) and maybe even a temporary table of results (which would take longer and use more memory). -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Rob Willett Gesendet: Freitag, 17. März 2017 12:19 An: SQLite mailing list Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ... Gunter, Simon, Thanks for the replies, You both seem to be drilling into the collation sequence as a possible issue. We now have a new index and we have just run the query again sqlite> analyze; sqlite> drop index Disruptions_idx4; sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status&q
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Gunter, I would never presume to describe anybody as a Nerd! We're just going back to very first position with the 'bad' collation index so we can do proper timings as we change things so we understand the speed up (we hope there is a speed up) We've written a quick script to check each version. Once we've put the original index back in, we've added a step to generate the SQLite bytecode for you. It's the least we can do... We'll post this when its completed but we suspect it may take most of the day now :) echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);'" echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "---" echo "Creating new index without collation" echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | sqlite3 tfl.sqlite echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "---" echo "Trying SELECT statement with UNION ALL" echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite date echo "select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null date echo "select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null date On 17 Mar 2017, at 11:30, Hick Gunter wrote: Nerds with chronic byte code affinity like myself would like to see the output of "explain" (without "query plan"), i.e. the SQLite bytecode produced. I guess the query with OR will have a subprogram called once for each status value, whereas I expect the query with UNION ALL to have 2 copies of the search (which would not affect the run time) and maybe even a temporary table of results (which would take longer and use more memory). -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Rob Willett Gesendet: Freitag, 17. März 2017 12:19 An: SQLite mailing list Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ... Gunter, Simon, Thanks for the replies, You both seem to be drilling into the collation sequence as a possible issue. We now have a new index and we have just run the query again sqlite> analyze; sqlite> drop index Disruptions_idx4; sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status"); sqlite> explain query plan select * from Disruptions where status = 2 OR status = 6; selectid|order|from|detail 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|EXECUTE LIST SUBQUERY 1 sqlite> So we have a different response from the query planner, which I think is good. If we use your other example sqlite> explain query plan select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6; selectid|order|from|detail 1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL) I'm not sure which query is going to be faster. We'll have to try and see. Your last suggestion of "select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically equivalent to "explain query plan select * from Disruptions where status = 2 OR status = 6;" now we have removed the collation from the index. sqlite> explain query plan select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE; selectid|order|from|detail 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|EXECUTE LIST SUBQUERY 1 sqlite> I'll check if we require all the fields, we require many (which I agree is not all) of the fields. Following this logic through, does this mean that it will do more file access bringing the records in from the file system? The collation issue seems to be an artifact of the way Navcat for SQLite works. I suspect we need to be more carefu
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Nerds with chronic byte code affinity like myself would like to see the output of "explain" (without "query plan"), i.e. the SQLite bytecode produced. I guess the query with OR will have a subprogram called once for each status value, whereas I expect the query with UNION ALL to have 2 copies of the search (which would not affect the run time) and maybe even a temporary table of results (which would take longer and use more memory). -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Rob Willett Gesendet: Freitag, 17. März 2017 12:19 An: SQLite mailing list Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ... Gunter, Simon, Thanks for the replies, You both seem to be drilling into the collation sequence as a possible issue. We now have a new index and we have just run the query again sqlite> analyze; sqlite> drop index Disruptions_idx4; sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status"); sqlite> explain query plan select * from Disruptions where status = 2 OR status = 6; selectid|order|from|detail 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|EXECUTE LIST SUBQUERY 1 sqlite> So we have a different response from the query planner, which I think is good. If we use your other example sqlite> explain query plan select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6; selectid|order|from|detail 1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL) I'm not sure which query is going to be faster. We'll have to try and see. Your last suggestion of "select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically equivalent to "explain query plan select * from Disruptions where status = 2 OR status = 6;" now we have removed the collation from the index. sqlite> explain query plan select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE; selectid|order|from|detail 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|EXECUTE LIST SUBQUERY 1 sqlite> I'll check if we require all the fields, we require many (which I agree is not all) of the fields. Following this logic through, does this mean that it will do more file access bringing the records in from the file system? The collation issue seems to be an artifact of the way Navcat for SQLite works. I suspect we need to be more careful about how we use the tool. We'll now time the results of each query and run them twice to see the affect. No idea how long this will take but suspect a few hours :) I will post back the results as other people may (or may not) find this helpful. Thanks Rob On 17 Mar 2017, at 10:57, Hick Gunter wrote: >> On 17 Mar 2017, at 10:20am, Rob Willett >> wrote: >> >>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE >>> NOCASE ASC); >>> >>> […] >>> >>> As part of the larger more complex query, we are executing the query >>> >>> ``` >>> select * from Disruptions where status = 2 OR status = 6; >> >> The schema for the table says that "status" is INTEGER. >> You are supplying numbers as arguments. >> Those two match and should create no problem. >> >> But your index has a collation order which is usually used for text. >> I don’t see that it is obviously wrong, but it does look a little >> weird. >> >> Try creating another index which is just on "status", without the >> COLLATE clause. >> Then do another ANALYZE, then try the SELECT again. >> >> Simon. > > If the index is deemed unsuitable by SQLite due to its collation > sequence, then I expect it qwould also be ignored in "select ... > status=1" (without the second ORed value) > > If not, then (select ... where status =2 UNION ALL select where status > = 6) should do the trick > > Do you really require all the fields from Disruptions? > > And yes, collating integers with NOCASE seems quite strange (there are > no capital or lowercase numbers unless you are using roman numerals ;) > ); for text affinity, it should render the comparison operators > caseblind, just like "like". > > > ___ > Gunter Hick > Software Engineer > Scientific Games International GmbH > FN 157284 a, HG Wien > Klitschgasse 2-4, A-1130 Vienna, Austria > Tel: +43 1 80100 0 > E-Mail: h...@scigames.at > > Thi
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Gunter, Simon, Thanks for the replies, You both seem to be drilling into the collation sequence as a possible issue. We now have a new index and we have just run the query again sqlite> analyze; sqlite> drop index Disruptions_idx4; sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status"); sqlite> explain query plan select * from Disruptions where status = 2 OR status = 6; selectid|order|from|detail 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|EXECUTE LIST SUBQUERY 1 sqlite> So we have a different response from the query planner, which I think is good. If we use your other example sqlite> explain query plan select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6; selectid|order|from|detail 1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL) I'm not sure which query is going to be faster. We'll have to try and see. Your last suggestion of "select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically equivalent to "explain query plan select * from Disruptions where status = 2 OR status = 6;" now we have removed the collation from the index. sqlite> explain query plan select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE; selectid|order|from|detail 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 0|0|0|EXECUTE LIST SUBQUERY 1 sqlite> I'll check if we require all the fields, we require many (which I agree is not all) of the fields. Following this logic through, does this mean that it will do more file access bringing the records in from the file system? The collation issue seems to be an artifact of the way Navcat for SQLite works. I suspect we need to be more careful about how we use the tool. We'll now time the results of each query and run them twice to see the affect. No idea how long this will take but suspect a few hours :) I will post back the results as other people may (or may not) find this helpful. Thanks Rob On 17 Mar 2017, at 10:57, Hick Gunter wrote: On 17 Mar 2017, at 10:20am, Rob Willett wrote: CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); […] As part of the larger more complex query, we are executing the query ``` select * from Disruptions where status = 2 OR status = 6; The schema for the table says that "status" is INTEGER. You are supplying numbers as arguments. Those two match and should create no problem. But your index has a collation order which is usually used for text. I don’t see that it is obviously wrong, but it does look a little weird. Try creating another index which is just on "status", without the COLLATE clause. Then do another ANALYZE, then try the SELECT again. Simon. If the index is deemed unsuitable by SQLite due to its collation sequence, then I expect it qwould also be ignored in "select ... status=1" (without the second ORed value) If not, then (select ... where status =2 UNION ALL select where status = 6) should do the trick Do you really require all the fields from Disruptions? And yes, collating integers with NOCASE seems quite strange (there are no capital or lowercase numbers unless you are using roman numerals ;) ); for text affinity, it should render the comparison operators caseblind, just like "like". ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
>On 17 Mar 2017, at 10:20am, Rob Willett wrote: > >> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); >> >> […] >> >> As part of the larger more complex query, we are executing the query >> >> ``` >> select * from Disruptions where status = 2 OR status = 6; > >The schema for the table says that "status" is INTEGER. >You are supplying numbers as arguments. >Those two match and should create no problem. > >But your index has a collation order which is usually used for text. I don’t >see that it is obviously wrong, but it does look a little weird. > >Try creating another index which is just on "status", without the COLLATE >clause. >Then do another ANALYZE, then try the SELECT again. > >Simon. If the index is deemed unsuitable by SQLite due to its collation sequence, then I expect it qwould also be ignored in "select ... status=1" (without the second ORed value) If not, then (select ... where status =2 UNION ALL select where status = 6) should do the trick Do you really require all the fields from Disruptions? And yes, collating integers with NOCASE seems quite strange (there are no capital or lowercase numbers unless you are using roman numerals ;) ); for text affinity, it should render the comparison operators caseblind, just like "like". PS. Also try "select .. status = 2 COLLATE NOCASE or status = 6 COLLATE NOCASE" just for good measure ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
>On 17 Mar 2017, at 10:20am, Rob Willett wrote: > >> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); >> >> […] >> >> As part of the larger more complex query, we are executing the query >> >> ``` >> select * from Disruptions where status = 2 OR status = 6; > >The schema for the table says that "status" is INTEGER. >You are supplying numbers as arguments. >Those two match and should create no problem. > >But your index has a collation order which is usually used for text. I don’t >see that it is obviously wrong, but it does look a little weird. > >Try creating another index which is just on "status", without the COLLATE >clause. >Then do another ANALYZE, then try the SELECT again. > >Simon. If the index is deemed unsuitable by SQLite due to its collation sequence, then I expect it qwould also be ignored in "select ... status=1" (without the second ORed value) If not, then (select ... where status =2 UNION ALL select where status = 6) should do the trick Do you really require all the fields from Disruptions? And yes, collating integers with NOCASE seems quite strange (there are no capital or lowercase numbers unless you are using roman numerals ;) ); for text affinity, it should render the comparison operators caseblind, just like "like". ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Simon, Thank you for the quick response. We'll do this. We're just waiting for another analyze to finish. Our current DB is 45GB, the query works OK on smaller databases, but at this size its very slow, thats why we have a bad feeling that we're tinkering on the edges and we need a new DB design :( The reason for the collation is the way NavCat SQLite works, its difficult to get it without the collate but we'll do this direct from the command line. We'll post back in an hour or so as it'll probably take that long to run :) Rob On 17 Mar 2017, at 10:27, Simon Slavin wrote: On 17 Mar 2017, at 10:20am, Rob Willett wrote: CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); […] As part of the larger more complex query, we are executing the query ``` select * from Disruptions where status = 2 OR status = 6; The schema for the table says that "status" is INTEGER. You are supplying numbers as arguments. Those two match and should create no problem. But your index has a collation order which is usually used for text. I don’t see that it is obviously wrong, but it does look a little weird. Try creating another index which is just on "status", without the COLLATE clause. Then do another ANALYZE, then try the SELECT again. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
On 17 Mar 2017, at 10:20am, Rob Willett wrote: > CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); > > […] > > As part of the larger more complex query, we are executing the query > > ``` > select * from Disruptions where status = 2 OR status = 6; The schema for the table says that "status" is INTEGER. You are supplying numbers as arguments. Those two match and should create no problem. But your index has a collation order which is usually used for text. I don’t see that it is obviously wrong, but it does look a little weird. Try creating another index which is just on "status", without the COLLATE clause. Then do another ANALYZE, then try the SELECT again. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
Hi, We've been struggling with a complex query that we have written. One of the elements of this complex query is a select statement that doesn't appear to use an index when we *think* it should do. We're not going to ask you to debug a large complex SQL query (unless you have nothing else to do today) but we're working our way through the query trying to understand where we've gone wrong. So we've broken down the query and are trying each section to see what it performs (or rather doesn't perform like). The sub query is a simple select on a large table, Disruptions. The Disruptions table has 180M rows of data. The schema for it is here. We've pulled it straight from Navicat for SQLite. ``` CREATE TABLE "Disruptions" ( "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "version" integer NOT NULL, "Disruption_id" INTEGER NOT NULL, "status" integer NOT NULL, "severity" integer NOT NULL, "levelOfInterest" integer NOT NULL, "category" integer NOT NULL, "subCategory" integer NOT NULL, "startTime" TEXT NOT NULL, "endTime" text NOT NULL, "location" integer NOT NULL, "corridor" integer NOT NULL, "comments" integer NOT NULL, "currentUpdate" integer NOT NULL, "remarkTime" TEXT NOT NULL, "lastModTime" TEXT NOT NULL, "CauseAreaPointX" real NOT NULL, "CauseAreaPointY" real NOT NULL, "Direction" TEXT ); INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions", '184626834'); -- -- Indexes structure for table Disruptions -- CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC); CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC); CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE NOCASE ASC); CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC); PRAGMA foreign_keys = true; ``` As part of the larger more complex query, we are executing the query ``` select * from Disruptions where status = 2 OR status = 6; ``` Status is one of six values, 1 to 6 and is probably not evenly distributed across the 180M rows. If we do ``` sqlite> explain query plan select * from Disruptions where status = 2 OR status = 6; selectid|order|from|detail 0|0|0|SCAN TABLE Disruptions ``` We can see that table scanning a 180M records is going to be slow, no matter what the rest of the complex query is like. We have an index Disruptions_idx4 which we *think* should speed it up, but the query plan doesn't seem to take this into account. We think that only having six values of Status means that the speed up from the index is not going to be fantastic but every little helps. We have run analyze on the database and that hasn't helped. Our gut feeling at this moment is that we have the design structure wrong in our DB and we're going to have to take a long hard look at what we're doing, do a redesign and a rebuild as we simply got it wrong from the beginning. Hindsight is wonderful ;) In the interim (as this is a 2-3 month job), we need to speed our query up from 90 mins down to something in the tens of mins. Any suggestions very much welcomed, Thanks Rob ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users