Re: [sqlite] Rewriting a query

2005-10-03 Thread John LeSueur

[EMAIL PROTECTED] wrote:


Robin Breathe <[EMAIL PROTECTED]> wrote:
 


Hugh Gibson wrote:
   


I'm intrigued. How do you get SQLite to use a multi-column index as it's
primary key (i.e. B-tree hash)? Please elaborate.
   

Simply 

CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT 
'',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY 
(sCommunityID, sTransactionID))
 


Ah, but it's not used for the B-tree hash (at least not according to the
documentation).

   



The ROWID is always used as the btree key (not hash!) on the main
database btree.  But when you have a PRIMARY KEY a separate index
btree is also created which uses the PRIMARY KEY as its key.

Question:  Why is this important to you?

(Side note: I am experimenting with a new Mail User Agent that
uses SQLite to store all its email messages.  I appologize in advance
if this message is misformatted or otherwise garbled.)
--
D. Richard Hipp <[EMAIL PROTECTED]>

 

I thought you might want to know that your new agent doesn't add 
References:  or In-Reply-To: .

This is not a big deal, but it does break discussion threading a little.

John


Re: Re: Re: [sqlite] Rewriting a query

2005-09-30 Thread Igor Tandetnik

[EMAIL PROTECTED] wrote:

"Miha Vrhovnik"<[EMAIL PROTECTED]> wrote:

Subject:
=?ISO-8859-1?Q?Re=3A=20Re=3A=20=5Bsqlite=5D=20Rewriting=20a=20query?=


What RFC do I need to read to figure out how to decode the Subject
line (presumably inserted by si.Mail)?


RFC 2047 "MIME (Multipurpose Internet Mail Extensions) Part Three: 
Message Header Extensions for Non-ASCII Text."


Igor Tandetnik 



Re: [sqlite] Rewriting a query

2005-09-30 Thread Dan Kennedy


--- [EMAIL PROTECTED] wrote:

> Robin Breathe <[EMAIL PROTECTED]> wrote:
> > Hugh Gibson wrote:
> > >> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> > >> primary key (i.e. B-tree hash)? Please elaborate.
> > > 
> > > Simply 
> > > 
> > > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT 
> > > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY 
> > > (sCommunityID, sTransactionID))
> > 
> > Ah, but it's not used for the B-tree hash (at least not according to the
> > documentation).
> > 
> 
> The ROWID is always used as the btree key (not hash!) on the main
> database btree.  But when you have a PRIMARY KEY a separate index
> btree is also created which uses the PRIMARY KEY as its key.
> 
> Question:  Why is this important to you?

I am not he, but if I were looking for ways to improve the file format
that's something I'd try to work in too. Both to save space and speed 
things up.

CREATE TABLE(a PRIMARY KEY, b);

creates two btree structures:

(a || oid) -> NULL(the index)
(oid) -> (a || b) (the table)

So for every row, there are two copies of both "a" and "oid". Depending
on your schema, the space consumed by the table is from 0-100% more than
if we were able to create a single btree:

(a || oid) -> (b) 

or even drop the oid altogether, it's not part of SQL anyway (is it?):

(a) -> (b)

As well as saving space, in the most common case an UPDATE or DELETE 
would have to modify one less tree, and some SELECTs would open one
less tree structure.

Tricky to retain backward compatibility though.



__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


Re: Re: [sqlite] Rewriting a query

2005-09-30 Thread Miha Vrhovnik
[EMAIL PROTECTED] je ob 30.9.2005 12:22:47 napisal(a):

>(Side note: I am experimenting with a new Mail User Agent that
>uses SQLite to store all its email messages.  I appologize in advance
>if this message is misformatted or otherwise garbled.)

Am. Who stole that idea from me?

--
It's time to get rid of your current e-mail client ...
... and start using si.Mail.

http://simail.sourceforge.net/


Re: [sqlite] Rewriting a query

2005-09-30 Thread drh
Robin Breathe <[EMAIL PROTECTED]> wrote:
> Hugh Gibson wrote:
> >> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> >> primary key (i.e. B-tree hash)? Please elaborate.
> > 
> > Simply 
> > 
> > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT 
> > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY 
> > (sCommunityID, sTransactionID))
> 
> Ah, but it's not used for the B-tree hash (at least not according to the
> documentation).
> 

The ROWID is always used as the btree key (not hash!) on the main
database btree.  But when you have a PRIMARY KEY a separate index
btree is also created which uses the PRIMARY KEY as its key.

Question:  Why is this important to you?

(Side note: I am experimenting with a new Mail User Agent that
uses SQLite to store all its email messages.  I appologize in advance
if this message is misformatted or otherwise garbled.)
--
D. Richard Hipp <[EMAIL PROTECTED]>



Re: [sqlite] Rewriting a query

2005-09-30 Thread Robin Breathe
Hugh Gibson wrote:
>> I'm intrigued. How do you get SQLite to use a multi-column index as it's
>> primary key (i.e. B-tree hash)? Please elaborate.
> 
> Simply 
> 
> CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT 
> '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY 
> (sCommunityID, sTransactionID))

Ah, but it's not used for the B-tree hash (at least not according to the
documentation).

>> Have you investigated the following to see how the optimizer deals with 
>> it?
> ...snip...
> I'm not fluent in the VDBE code, but ISTM (and execution time confirms) 
> that it's doing a table scan first to do the GROUP BY. Then it 
> creates a temp table with the values from the list, and does a join (I 
> suppose).

I thought it probably would, but worth trying :)

>> I think I'd normally err on keeping a filter table handy (temporary if
>> you wish):
>>
>> -- Initialise filter:
>> CREATE TABLE filter (sCommunityID TEXT);
>> CREATE INDEX filter_idx ON filter(sCommunityID);
>>
>> -- Configure filter:
>> INSERT INTO filter VALUES ('a03061bFi');
>> INSERT INTO filter VALUES ('a03064KDy');
>> INSERT INTO filter VALUES ('a03068QhK');
>>
>> -- Get your results:
>> SELECT sCommunityID, max(sTransactionID)
>>   FROM filter NATURAL INNER JOIN TransactionList
>>  GROUP BY sCommunityID;
>>
>> -- Clear filter ready for next time:
>> DELETE FROM filter;
> 
> Thread safety is an issue here. Multiple threads may be doing this action. 
> Also, by the (admittedly contorted) query I am using I can get the VDBE to 
> create a temporary table for me and insert the data, taking a lot less 
> time to do it than it would take to run all those queries.

You didn't mention threads :)

Robin
-- 
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
[EMAIL PROTECTED]   Tel: +44 1865 483685  Fax: +44 1865 483073



signature.asc
Description: OpenPGP digital signature


Re: [sqlite] Rewriting a query

2005-09-29 Thread Hugh Gibson
> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> primary key (i.e. B-tree hash)? Please elaborate.

Simply 

CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT 
'',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY 
(sCommunityID, sTransactionID))

> Have you investigated the following to see how the optimizer deals with 
> it?
> 
> SELECT sCommunityID, max(sTransactionID)
>   FROM TransactionList
>  GROUP BY sCommunityID
> HAVING sCommunityID IN ('a03061bFi','a03064KDy', 
> 'a03068QhK')

I'm not fluent in the VDBE code, but ISTM (and execution time confirms) 
that it's doing a table scan first to do the GROUP BY. Then it 
creates a temp table with the values from the list, and does a join (I 
suppose).

> I think I'd normally err on keeping a filter table handy (temporary if
> you wish):
> 
> -- Initialise filter:
> CREATE TABLE filter (sCommunityID TEXT);
> CREATE INDEX filter_idx ON filter(sCommunityID);
> 
> -- Configure filter:
> INSERT INTO filter VALUES ('a03061bFi');
> INSERT INTO filter VALUES ('a03064KDy');
> INSERT INTO filter VALUES ('a03068QhK');
> 
> -- Get your results:
> SELECT sCommunityID, max(sTransactionID)
>   FROM filter NATURAL INNER JOIN TransactionList
>  GROUP BY sCommunityID;
> 
> -- Clear filter ready for next time:
> DELETE FROM filter;

Thread safety is an issue here. Multiple threads may be doing this action. 
Also, by the (admittedly contorted) query I am using I can get the VDBE to 
create a temporary table for me and insert the data, taking a lot less 
time to do it than it would take to run all those queries.
 
> I feel fairly confident that that method would make good use of your
> existing index.

Yes, I'm sure too. See my comment about joining to the Community table 
(that still requires the list of values, but it's fast because of the way 
the query is constructed).

> You can use a nice sqlite3_bind_text() for those inserts :)

I'm using Python and PySQLite but I can do that through executemany.

> If you really, really wanted you could munge the filter table into a
> sequence of UNIONs, but it would be nowhere near as elegant. This method
> easily extends to any number of filtered sCommunityIDs, while anything
> in a single query is going to start getting "icky".

Yes, hence my question about the max size of queries. It wouldn't be a 
problem to create multiple big queries and run them, as the running time 
is still O(N). It's just how long the query engine takes to compile the 
SQL if it's very long. 
 
> Probably too late, but I've also found that indexing, grouping and
> joining are far faster on INT columns. If you could store sCommunityID
> as an INT...

Interesting. It is, indeed, way too late :-)
 
> Please correct me if any of the above is junk, I'm still learning :)

If we stop learning then someone should call the undertaker ;-)

Hugh


Re: [sqlite] Rewriting a query

2005-09-29 Thread Robin Breathe
Hugh Gibson wrote:
>> What happens if you create the index on sCommunityID only? Does 
>> it still do the full table scan?
> 
> A bit difficult to drop the (sCommunityID, sTransactionID) index, as it's 
> the primary key. 

I'm intrigued. How do you get SQLite to use a multi-column index as it's
primary key (i.e. B-tree hash)? Please elaborate.

Have you investigated the following to see how the optimizer deals with it?

SELECT sCommunityID, max(sTransactionID)
  FROM TransactionList
 GROUP BY sCommunityID
HAVING sCommunityID IN ('a03061bFi','a03064KDy', 'a03068QhK');

I think I'd normally err on keeping a filter table handy (temporary if
you wish):

-- Initialise filter:
CREATE TABLE filter (sCommunityID TEXT);
CREATE INDEX filter_idx ON filter(sCommunityID);

-- Configure filter:
INSERT INTO filter VALUES ('a03061bFi');
INSERT INTO filter VALUES ('a03064KDy');
INSERT INTO filter VALUES ('a03068QhK');

-- Get your results:
SELECT sCommunityID, max(sTransactionID)
  FROM filter NATURAL INNER JOIN TransactionList
 GROUP BY sCommunityID;

-- Clear filter ready for next time:
DELETE FROM filter;

I feel fairly confident that that method would make good use of your
existing index.
You can use a nice sqlite3_bind_text() for those inserts :)
If you really, really wanted you could munge the filter table into a
sequence of UNIONs, but it would be nowhere near as elegant. This method
easily extends to any number of filtered sCommunityIDs, while anything
in a single query is going to start getting "icky".

Probably too late, but I've also found that indexing, grouping and
joining are far faster on INT columns. If you could store sCommunityID
as an INT...

Please correct me if any of the above is junk, I'm still learning :)

Regards,
Robin
-- 
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
[EMAIL PROTECTED]   Tel: +44 1865 483685  Fax: +44 1865 483073



signature.asc
Description: OpenPGP digital signature


Re: [sqlite] Rewriting a query

2005-09-28 Thread Hugh Gibson
> What happens if you create the index on sCommunityID only? Does 
> it still do the full table scan?

A bit difficult to drop the (sCommunityID, sTransactionID) index, as it's 
the primary key. 

> Also, don't overlook using UNION or UNION ALL, ugly as they 
> can be. Maybe something like this could be used to avoid creating
> a very small temporary table:
> 
> SELECT ... FROM (SELECT ... UNION SELECT ... UNION SELECT ...), 
> TransactionList WHERE ...

Hmmm. Something like this works:

SELECT sCommID, (SELECT sTransactionID
FROM TransactionList
WHERE sCommunityID = sCommID
ORDER BY sTransactionID Desc Limit 1) as MaxID
>From (SELECT 'a03061bFi' As sCommID UNION ALL SELECT 'a03064KDy' 
As sCommID UNION ALL SELECT 'a03068QhK' As sCommID)

The "explain" output shows that the Community table isn't being touched 
now, and in fact it creates a temp table with the hard-coded values in it, 
similar to "IN". So the final query may run faster (it's certainly very 
fast with the data I have).

I might be working with a few thousand communities. Is there a limit to 
the size of queries? In that case the "IN" clause would be better as it 
uses less text.

Hugh


Re: [sqlite] Rewriting a query

2005-09-28 Thread Dan Kennedy

What happens if you create the index on sCommunityID only? Does 
it still do the full table scan?

Also, don't overlook using UNION or UNION ALL, ugly as they 
can be. Maybe something like this could be used to avoid creating
a very small temporary table:

SELECT ... FROM (SELECT ... UNION SELECT ... UNION SELECT ...), TransactionList 
WHERE ...

--- Hugh Gibson <[EMAIL PROTECTED]> wrote:

> I have this query:
> 
>   SELECT sCommunityID, max(sTransactionID)
>   FROM TransactionList 
>   WHERE sCommunityID in ('a03061bFi','a03064KDy', 'a03068QhK')
>   GROUP BY sCommunityID
> 
> There is an index on (sCommunityID, sTransactionID)
> 
> This forces a table scan (perhaps improved in 3.2.6). 
> 
> I can use a LIMIT 1 clause like this, but only with one community ID:
> 
>   SELECT sCommunityID, sTransactionID
>   FROM TransactionList 
>   WHERE sCommunityID  = 'a03061bFi'
>   ORDER BY sTransactionID Desc LIMIT 1
> 
> Is there a way of getting the latter to work with multiple sCommunityIDs?
> 
> Hugh



__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


Re: [sqlite] Rewriting a query

2005-09-28 Thread Hugh Gibson
The following works for me. The Community table has only one entry per 
community ID so it's fast to look up.

SELECT sCommunityID, (SELECT sTransactionID
FROM TransactionList
WHERE sCommunityID = Community.sCommunityID
ORDER BY sTransactionID Desc Limit 1) as MaxID
>From Community WHERE sCommunityID In ('a03061bFi','a03064KDy', 
'a03068QhK')

Is there any way of just supplying a list of records without having to use 
another table? I know I can use parameterised queries but I would like to 
be able to use a single query to get the data, as it feels like it should 
be possible!

Hugh


[sqlite] Rewriting a query

2005-09-28 Thread Hugh Gibson
I have this query:

  SELECT sCommunityID, max(sTransactionID)
  FROM TransactionList 
  WHERE sCommunityID in ('a03061bFi','a03064KDy', 'a03068QhK')
  GROUP BY sCommunityID

There is an index on (sCommunityID, sTransactionID)

This forces a table scan (perhaps improved in 3.2.6). 

I can use a LIMIT 1 clause like this, but only with one community ID:

  SELECT sCommunityID, sTransactionID
  FROM TransactionList 
  WHERE sCommunityID  = 'a03061bFi'
  ORDER BY sTransactionID Desc LIMIT 1

Is there a way of getting the latter to work with multiple sCommunityIDs?

Hugh