Re: [sqlite] Rewriting a query
[EMAIL PROTECTED] wrote: Robin Breathe <[EMAIL PROTECTED]> wrote: Hugh Gibson wrote: I'm intrigued. How do you get SQLite to use a multi-column index as it's primary key (i.e. B-tree hash)? Please elaborate. Simply CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY (sCommunityID, sTransactionID)) Ah, but it's not used for the B-tree hash (at least not according to the documentation). The ROWID is always used as the btree key (not hash!) on the main database btree. But when you have a PRIMARY KEY a separate index btree is also created which uses the PRIMARY KEY as its key. Question: Why is this important to you? (Side note: I am experimenting with a new Mail User Agent that uses SQLite to store all its email messages. I appologize in advance if this message is misformatted or otherwise garbled.) -- D. Richard Hipp <[EMAIL PROTECTED]> I thought you might want to know that your new agent doesn't add References: or In-Reply-To: . This is not a big deal, but it does break discussion threading a little. John
Re: Re: Re: [sqlite] Rewriting a query
[EMAIL PROTECTED] wrote: "Miha Vrhovnik"<[EMAIL PROTECTED]> wrote: Subject: =?ISO-8859-1?Q?Re=3A=20Re=3A=20=5Bsqlite=5D=20Rewriting=20a=20query?= What RFC do I need to read to figure out how to decode the Subject line (presumably inserted by si.Mail)? RFC 2047 "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text." Igor Tandetnik
Re: [sqlite] Rewriting a query
--- [EMAIL PROTECTED] wrote: > Robin Breathe <[EMAIL PROTECTED]> wrote: > > Hugh Gibson wrote: > > >> I'm intrigued. How do you get SQLite to use a multi-column index as it's > > >> primary key (i.e. B-tree hash)? Please elaborate. > > > > > > Simply > > > > > > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT > > > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY > > > (sCommunityID, sTransactionID)) > > > > Ah, but it's not used for the B-tree hash (at least not according to the > > documentation). > > > > The ROWID is always used as the btree key (not hash!) on the main > database btree. But when you have a PRIMARY KEY a separate index > btree is also created which uses the PRIMARY KEY as its key. > > Question: Why is this important to you? I am not he, but if I were looking for ways to improve the file format that's something I'd try to work in too. Both to save space and speed things up. CREATE TABLE(a PRIMARY KEY, b); creates two btree structures: (a || oid) -> NULL(the index) (oid) -> (a || b) (the table) So for every row, there are two copies of both "a" and "oid". Depending on your schema, the space consumed by the table is from 0-100% more than if we were able to create a single btree: (a || oid) -> (b) or even drop the oid altogether, it's not part of SQL anyway (is it?): (a) -> (b) As well as saving space, in the most common case an UPDATE or DELETE would have to modify one less tree, and some SELECTs would open one less tree structure. Tricky to retain backward compatibility though. __ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
Re: Re: [sqlite] Rewriting a query
[EMAIL PROTECTED] je ob 30.9.2005 12:22:47 napisal(a): >(Side note: I am experimenting with a new Mail User Agent that >uses SQLite to store all its email messages. I appologize in advance >if this message is misformatted or otherwise garbled.) Am. Who stole that idea from me? -- It's time to get rid of your current e-mail client ... ... and start using si.Mail. http://simail.sourceforge.net/
Re: [sqlite] Rewriting a query
Robin Breathe <[EMAIL PROTECTED]> wrote: > Hugh Gibson wrote: > >> I'm intrigued. How do you get SQLite to use a multi-column index as it's > >> primary key (i.e. B-tree hash)? Please elaborate. > > > > Simply > > > > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT > > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY > > (sCommunityID, sTransactionID)) > > Ah, but it's not used for the B-tree hash (at least not according to the > documentation). > The ROWID is always used as the btree key (not hash!) on the main database btree. But when you have a PRIMARY KEY a separate index btree is also created which uses the PRIMARY KEY as its key. Question: Why is this important to you? (Side note: I am experimenting with a new Mail User Agent that uses SQLite to store all its email messages. I appologize in advance if this message is misformatted or otherwise garbled.) -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] Rewriting a query
Hugh Gibson wrote: >> I'm intrigued. How do you get SQLite to use a multi-column index as it's >> primary key (i.e. B-tree hash)? Please elaborate. > > Simply > > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY > (sCommunityID, sTransactionID)) Ah, but it's not used for the B-tree hash (at least not according to the documentation). >> Have you investigated the following to see how the optimizer deals with >> it? > ...snip... > I'm not fluent in the VDBE code, but ISTM (and execution time confirms) > that it's doing a table scan first to do the GROUP BY. Then it > creates a temp table with the values from the list, and does a join (I > suppose). I thought it probably would, but worth trying :) >> I think I'd normally err on keeping a filter table handy (temporary if >> you wish): >> >> -- Initialise filter: >> CREATE TABLE filter (sCommunityID TEXT); >> CREATE INDEX filter_idx ON filter(sCommunityID); >> >> -- Configure filter: >> INSERT INTO filter VALUES ('a03061bFi'); >> INSERT INTO filter VALUES ('a03064KDy'); >> INSERT INTO filter VALUES ('a03068QhK'); >> >> -- Get your results: >> SELECT sCommunityID, max(sTransactionID) >> FROM filter NATURAL INNER JOIN TransactionList >> GROUP BY sCommunityID; >> >> -- Clear filter ready for next time: >> DELETE FROM filter; > > Thread safety is an issue here. Multiple threads may be doing this action. > Also, by the (admittedly contorted) query I am using I can get the VDBE to > create a temporary table for me and insert the data, taking a lot less > time to do it than it would take to run all those queries. You didn't mention threads :) Robin -- Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK [EMAIL PROTECTED] Tel: +44 1865 483685 Fax: +44 1865 483073 signature.asc Description: OpenPGP digital signature
Re: [sqlite] Rewriting a query
> I'm intrigued. How do you get SQLite to use a multi-column index as it's > primary key (i.e. B-tree hash)? Please elaborate. Simply CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY (sCommunityID, sTransactionID)) > Have you investigated the following to see how the optimizer deals with > it? > > SELECT sCommunityID, max(sTransactionID) > FROM TransactionList > GROUP BY sCommunityID > HAVING sCommunityID IN ('a03061bFi','a03064KDy', > 'a03068QhK') I'm not fluent in the VDBE code, but ISTM (and execution time confirms) that it's doing a table scan first to do the GROUP BY. Then it creates a temp table with the values from the list, and does a join (I suppose). > I think I'd normally err on keeping a filter table handy (temporary if > you wish): > > -- Initialise filter: > CREATE TABLE filter (sCommunityID TEXT); > CREATE INDEX filter_idx ON filter(sCommunityID); > > -- Configure filter: > INSERT INTO filter VALUES ('a03061bFi'); > INSERT INTO filter VALUES ('a03064KDy'); > INSERT INTO filter VALUES ('a03068QhK'); > > -- Get your results: > SELECT sCommunityID, max(sTransactionID) > FROM filter NATURAL INNER JOIN TransactionList > GROUP BY sCommunityID; > > -- Clear filter ready for next time: > DELETE FROM filter; Thread safety is an issue here. Multiple threads may be doing this action. Also, by the (admittedly contorted) query I am using I can get the VDBE to create a temporary table for me and insert the data, taking a lot less time to do it than it would take to run all those queries. > I feel fairly confident that that method would make good use of your > existing index. Yes, I'm sure too. See my comment about joining to the Community table (that still requires the list of values, but it's fast because of the way the query is constructed). > You can use a nice sqlite3_bind_text() for those inserts :) I'm using Python and PySQLite but I can do that through executemany. > If you really, really wanted you could munge the filter table into a > sequence of UNIONs, but it would be nowhere near as elegant. This method > easily extends to any number of filtered sCommunityIDs, while anything > in a single query is going to start getting "icky". Yes, hence my question about the max size of queries. It wouldn't be a problem to create multiple big queries and run them, as the running time is still O(N). It's just how long the query engine takes to compile the SQL if it's very long. > Probably too late, but I've also found that indexing, grouping and > joining are far faster on INT columns. If you could store sCommunityID > as an INT... Interesting. It is, indeed, way too late :-) > Please correct me if any of the above is junk, I'm still learning :) If we stop learning then someone should call the undertaker ;-) Hugh
Re: [sqlite] Rewriting a query
Hugh Gibson wrote: >> What happens if you create the index on sCommunityID only? Does >> it still do the full table scan? > > A bit difficult to drop the (sCommunityID, sTransactionID) index, as it's > the primary key. I'm intrigued. How do you get SQLite to use a multi-column index as it's primary key (i.e. B-tree hash)? Please elaborate. Have you investigated the following to see how the optimizer deals with it? SELECT sCommunityID, max(sTransactionID) FROM TransactionList GROUP BY sCommunityID HAVING sCommunityID IN ('a03061bFi','a03064KDy', 'a03068QhK'); I think I'd normally err on keeping a filter table handy (temporary if you wish): -- Initialise filter: CREATE TABLE filter (sCommunityID TEXT); CREATE INDEX filter_idx ON filter(sCommunityID); -- Configure filter: INSERT INTO filter VALUES ('a03061bFi'); INSERT INTO filter VALUES ('a03064KDy'); INSERT INTO filter VALUES ('a03068QhK'); -- Get your results: SELECT sCommunityID, max(sTransactionID) FROM filter NATURAL INNER JOIN TransactionList GROUP BY sCommunityID; -- Clear filter ready for next time: DELETE FROM filter; I feel fairly confident that that method would make good use of your existing index. You can use a nice sqlite3_bind_text() for those inserts :) If you really, really wanted you could munge the filter table into a sequence of UNIONs, but it would be nowhere near as elegant. This method easily extends to any number of filtered sCommunityIDs, while anything in a single query is going to start getting "icky". Probably too late, but I've also found that indexing, grouping and joining are far faster on INT columns. If you could store sCommunityID as an INT... Please correct me if any of the above is junk, I'm still learning :) Regards, Robin -- Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK [EMAIL PROTECTED] Tel: +44 1865 483685 Fax: +44 1865 483073 signature.asc Description: OpenPGP digital signature
Re: [sqlite] Rewriting a query
> What happens if you create the index on sCommunityID only? Does > it still do the full table scan? A bit difficult to drop the (sCommunityID, sTransactionID) index, as it's the primary key. > Also, don't overlook using UNION or UNION ALL, ugly as they > can be. Maybe something like this could be used to avoid creating > a very small temporary table: > > SELECT ... FROM (SELECT ... UNION SELECT ... UNION SELECT ...), > TransactionList WHERE ... Hmmm. Something like this works: SELECT sCommID, (SELECT sTransactionID FROM TransactionList WHERE sCommunityID = sCommID ORDER BY sTransactionID Desc Limit 1) as MaxID >From (SELECT 'a03061bFi' As sCommID UNION ALL SELECT 'a03064KDy' As sCommID UNION ALL SELECT 'a03068QhK' As sCommID) The "explain" output shows that the Community table isn't being touched now, and in fact it creates a temp table with the hard-coded values in it, similar to "IN". So the final query may run faster (it's certainly very fast with the data I have). I might be working with a few thousand communities. Is there a limit to the size of queries? In that case the "IN" clause would be better as it uses less text. Hugh
Re: [sqlite] Rewriting a query
What happens if you create the index on sCommunityID only? Does it still do the full table scan? Also, don't overlook using UNION or UNION ALL, ugly as they can be. Maybe something like this could be used to avoid creating a very small temporary table: SELECT ... FROM (SELECT ... UNION SELECT ... UNION SELECT ...), TransactionList WHERE ... --- Hugh Gibson <[EMAIL PROTECTED]> wrote: > I have this query: > > SELECT sCommunityID, max(sTransactionID) > FROM TransactionList > WHERE sCommunityID in ('a03061bFi','a03064KDy', 'a03068QhK') > GROUP BY sCommunityID > > There is an index on (sCommunityID, sTransactionID) > > This forces a table scan (perhaps improved in 3.2.6). > > I can use a LIMIT 1 clause like this, but only with one community ID: > > SELECT sCommunityID, sTransactionID > FROM TransactionList > WHERE sCommunityID = 'a03061bFi' > ORDER BY sTransactionID Desc LIMIT 1 > > Is there a way of getting the latter to work with multiple sCommunityIDs? > > Hugh __ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
Re: [sqlite] Rewriting a query
The following works for me. The Community table has only one entry per community ID so it's fast to look up. SELECT sCommunityID, (SELECT sTransactionID FROM TransactionList WHERE sCommunityID = Community.sCommunityID ORDER BY sTransactionID Desc Limit 1) as MaxID >From Community WHERE sCommunityID In ('a03061bFi','a03064KDy', 'a03068QhK') Is there any way of just supplying a list of records without having to use another table? I know I can use parameterised queries but I would like to be able to use a single query to get the data, as it feels like it should be possible! Hugh
[sqlite] Rewriting a query
I have this query: SELECT sCommunityID, max(sTransactionID) FROM TransactionList WHERE sCommunityID in ('a03061bFi','a03064KDy', 'a03068QhK') GROUP BY sCommunityID There is an index on (sCommunityID, sTransactionID) This forces a table scan (perhaps improved in 3.2.6). I can use a LIMIT 1 clause like this, but only with one community ID: SELECT sCommunityID, sTransactionID FROM TransactionList WHERE sCommunityID = 'a03061bFi' ORDER BY sTransactionID Desc LIMIT 1 Is there a way of getting the latter to work with multiple sCommunityIDs? Hugh