Re: [sqlite] Query performance issues - index selection

2006-06-02 Thread Steve Green

Kurt,

Thanks, the single primary index you suggested does give us the
desired results.

Steve

Kurt Welgehausen wrote:

Steve Green <[EMAIL PROTECTED]> wrote:



CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
...
CREATE INDEX ix_data_ut on data( u_id, utime );



Wouldn't a unique index on (u_id, utime, r_id) get you
the pk constraint and better performance on your query
with no other index to confuse things?

Regards


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Query performance issues - index selection

2006-06-01 Thread drh
Steve Green <[EMAIL PROTECTED]> wrote:
> the unary + operator in front of the two utime terms did cause the optimal
> index to be used... Can you explain why this works?

A unary + in front of a column name disqualifies the term from
consideration by the optimizer, and hence from use by indices.
That left the other index as the only choice.
> 
> What is the best way to send you sample data (and what is the maximum
> file size that you'd like)?
> 

Actually, if you could just send me your schema and the
query that is causing problems by private email, I can
probably use that to figure out why the optimizer is failing
so badly in this case.
--
D. Richard Hipp   <[EMAIL PROTECTED]>



Re: [sqlite] Query performance issues - index selection

2006-06-01 Thread Kurt Welgehausen
Steve Green <[EMAIL PROTECTED]> wrote:

> CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
> ...
> CREATE INDEX ix_data_ut on data( u_id, utime );

Wouldn't a unique index on (u_id, utime, r_id) get you
the pk constraint and better performance on your query
with no other index to confuse things?

Regards


Re: [sqlite] Query performance issues - index selection

2006-06-01 Thread Steve Green

Richard,

Thanks for the reply.  I did run ANALYZE and that didn't help.  However,
the unary + operator in front of the two utime terms did cause the optimal
index to be used... Can you explain why this works?

I also realized that if I change the "where" clause to something like
where u_id >= 0 and utime >= ..., it will cuase the optimal index to be
used (however, if we every started using negative u_id values, it would
break).

What is the best way to send you sample data (and what is the maximum
file size that you'd like)?

Steve

[EMAIL PROTECTED] wrote:


Steve Green <[EMAIL PROTECTED]> wrote:
[...]


select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10



[...]


So, is there a way that I can get sqlite to use the optimal index
without having to remove my unique index? 



Have you run ANALYZE.  That might fix it.

If not, try adding a unary + in front of the two utime
terms in the WHERE clause:

WHERE +utime >= 1145441600 AND +utime < 114912

If you can send me some sample data and queries, that would
be great.  It will help me to improve the optimizer so that
these kinds of things come up less often in the future.
--
D. Richard Hipp   <[EMAIL PROTECTED]>


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Query performance issues - index selection

2006-06-01 Thread drh
Steve Green <[EMAIL PROTECTED]> wrote:
[...]
> 
> select u_id, sum( data1 ), sum( data2 )
> from data where utime >= 1146441600 and utime < 114912
> group by u_id
> order by sum( data1 ) desc
> limit 10
> 
[...]
> 
> So, is there a way that I can get sqlite to use the optimal index
> without having to remove my unique index? 

Have you run ANALYZE.  That might fix it.

If not, try adding a unary + in front of the two utime
terms in the WHERE clause:

WHERE +utime >= 1145441600 AND +utime < 114912

If you can send me some sample data and queries, that would
be great.  It will help me to improve the optimizer so that
these kinds of things come up less often in the future.
--
D. Richard Hipp   <[EMAIL PROTECTED]>



[sqlite] Query performance issues - index selection

2006-06-01 Thread Steve Green

Hi,

I have the following schema

CREATE TABLE data(
utime int4,
r_id int2,
u_id int4,
data1 int8,
data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  To test
the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an explain query
plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance improvement
over the unique index, although sqlite's query planner chose these indexes over
the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index without having
to remove my unique index?  Perhaps this would involve rewritting the query, but
I'm at a loss as to how that could be done.

Thanks for your time,

Steve
--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.