Re: [sqlite] PATCH: WHERE clause OR to UNION optimization

2007-12-22 Thread Joe Wilson
This new attached patch corrects a slight inefficiency in my 
previous patch. It now uses UNION ALL (TK_ALL) instead of UNION
(TK_UNION) for the subselects, as was my initial intention. 
OR queries on dis-similar columns with large intermediate result
sets are now even faster.

Given the same sample database below and running this query:

 select count(*) from stuff 
 where b=1 or b=5 or c=2 or d=4 or b=9;

will yield these timings:

 sqlite 3.5.4 unpatched: CPU Time: user 3.560223 sys 0.156010
 3.5.4 + 20071221 patch: CPU Time: user 1.252079 sys 0.072004
 3.5.4 + 20071222 patch: CPU Time: user 0.776048 sys 0.116007

I also changed the OR cost estimator to reduce the cost for each 
additional OR expression pair, as it better reflects the actual
cost.

Here's the meta-diff between yesterday's patch and the new one:

-+return (costL+costR);
++/* Reduce the cost for each OR expression pair by a few percent due
++** to the fact that if the transform was performed then there would
++** be fewer expressions to evaluate in each compound subselect's
++** WHERE clause.
++*/
++return (costL+costR)*31/32;
@@ -323 +328 @@
-+  p->op = TK_UNION;
++  p->op = TK_ALL;
@@ -329 +334 @@
-+p->op = TK_UNION;
++p->op = TK_ALL;
@@ -339 +344 @@
-+p->op = TK_UNION;
++p->op = TK_ALL;


--- Joe Wilson <[EMAIL PROTECTED]> wrote:
> The attached patch implements the WHERE clause "OR to UNION" 
> optimization as described in this post:
> 
>  http://www.mail-archive.com/sqlite-users@sqlite.org/msg09004.html
> 
> If the computed cost of the rewritten WHERE clause is lower than 
> the original query when indexes are taken into account, then it 
> will perform the optimization. If the cost is estimated to be 
> higher then the query will not be rewritten.
> 
> Given the database formed by running these statements:
> 
>   create table stuff(a,b,c,d);
>   insert into stuff values(1,2,3,4);
>   create temp view v1 as select random()%100,
> random()%100, random()%1000, random()%1
>  from stuff x, stuff y;
>   insert into stuff select * from v1;
>   insert into stuff select * from v1;
>   insert into stuff select * from v1;
>   insert into stuff select * from v1;
>   insert into stuff select * from v1;
>   create index stuff_b on stuff(b);
>   create index stuff_c on stuff(c);
>   create index stuff_d on stuff(d);
>   analyze;
> 
> The patched version of sqlite 3.5.4 will run the following query 
> many times faster than an unpatched sqlite 3.5.4:
> 
>   select b, count(*) from stuff 
>   where c=2 or b=23 or c=17 or c=493 or d=7 or c=111 and a=14 
>   group by 1 order by 2 DESC, 1 limit 10;
> 
> On my machine, the patched version produces these query timings:
> 
>   CPU Time: user 0.724045 sys 0.092005
> 
> with the EXPLAIN QUERY PLAN:
> 
>   0|0|TABLE stuff USING PRIMARY KEY
>   0|0|TABLE stuff WITH INDEX stuff_c
>   0|0|TABLE stuff WITH INDEX stuff_d
>   0|0|TABLE stuff WITH INDEX stuff_b
>   0|0|TABLE stuff WITH INDEX stuff_c
> 
> For the same query the unpatched sqlite 3.5.4 produces:
> 
>   CPU Time: user 20.869304 sys 8.912557
> 
>   0|0|TABLE stuff WITH INDEX stuff_b ORDER BY
> 
> Only single table queries are supported by this OR optimization. 
> For this optimization to be considered, the WHERE clause may only 
> consist of column equality comparisons to constants, ORs and ANDs.
> 
> The optimization only looks at the top-level WHERE clause ORs. It 
> will not work with "IN" expressions. Nor will it will not expand 
> expressions like "a=1 AND (b=2 or c=3)" into "a=1 AND b=2 OR a=1 
> AND c=3" - although if manually expanded, the latter form could 
> potentially be optimized.
> 
> It passes "make test" without regressions, but more testing is needed.




  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hsIndex: src/select.c
===
RCS file: /sqlite/sqlite/src/select.c,v
retrieving revision 1.372
diff -u -3 -p -r1.372 select.c
--- src/select.c14 Dec 2007 17:24:40 -  1.372
+++ src/select.c22 Dec 2007 17:37:13 -
@@ -12,7 +12,7 @@
 ** This file contains C code routines that are called by the parser
 ** to handle SELECT statements in SQLite.
 **
-** $Id: select.c,v 1.372 2007/12/14 17:24:40 drh Exp $
+** $Id: select.c,v 1.370 2007/12/13 21:54:11 drh Exp $
 */
 #include "sqliteInt.h"
 
@@ -2961,6 +2961,445 @@ static void updateAccumulator(Parse *pPa
   pAggInfo->directMode = 0;
 }
 
+#ifndef SQLITE_OMIT_OR_UNION_TRANSFORM
+
+/* The function prefix

[sqlite] Hello from Sqlite Latino America

2007-12-22 Thread gerardo cabero
Hello Community Sqlite
We Gerardo Antonio Cabero and Daniel Maldonado, administrators Sqlite
http://sqlite-latino.blogspot.com/ Latin America --
Sqlite Latin America?
A site for the communities of Sqlite Hablahispana, which are available
examples, documentation
We are the Latino community sqlite
. Gerardo Antonio and Daniel Maldonado

http://sqlite-latino.blogspot.com/


Re: [sqlite] Re: Re: Any Ideas to speed up CAST

2007-12-22 Thread Mag. Wilhelm Braun
Thanks excellent thought - so simple and I did not think about it. What 
a shame.


THANKS once again and


'MERRY X-MASS'

W.Braun






Igor Tandetnik wrote:

Mag. Wilhelm Braun
<[EMAIL PROTECTED]> wrote:

I would be interesting about your point of keeping the precision in
floating point values without storing it as strings.


Usually, when you want to do this, it's because you are working with 
monetary values. In this case, it is better to store them as integers, 
scaled, say, by a factor of 1000 or whatever accuracy you need.


Igor Tandetnik

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 







-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Re: Any Ideas to speed up CAST

2007-12-22 Thread Igor Tandetnik

Mag. Wilhelm Braun
<[EMAIL PROTECTED]> wrote:

I would be interesting about your point of keeping the precision in
floating point values without storing it as strings.


Usually, when you want to do this, it's because you are working with 
monetary values. In this case, it is better to store them as integers, 
scaled, say, by a factor of 1000 or whatever accuracy you need.


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Any Ideas to speed up CAST

2007-12-22 Thread Mag. Wilhelm Braun

Thanks Igor

that makes sense about integers.

I would be interesting about your point of keeping the precision in 
floating point values without storing it as strings.


would be a great help as I have a few cases like that.

Thanks so much for your kind help

W.Braun


Igor Tandetnik wrote:

Mag. Wilhelm Braun
<[EMAIL PROTECTED]> wrote:

because of accuracy I use everywhere text entries even for numerical
entries.


You seem to be storing integer values as text. What kind of accuracy 
improvement do you expect from this? Integers are stored losslessly 
already.


I could remotely understand storing floating point values as strings 
(though there are better ways to preserve precision) - but integers?


Igor Tandetnik

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 







-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Any Ideas to speed up CAST

2007-12-22 Thread Igor Tandetnik

Mag. Wilhelm Braun
<[EMAIL PROTECTED]> wrote:

because of accuracy I use everywhere text entries even for numerical
entries.


You seem to be storing integer values as text. What kind of accuracy 
improvement do you expect from this? Integers are stored losslessly 
already.


I could remotely understand storing floating point values as strings 
(though there are better ways to preserve precision) - but integers?


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Any Ideas to speed up CAST

2007-12-22 Thread Mag. Wilhelm Braun

hello,

because of accuracy I use everywhere text entries even for numerical 
entries.


So if I have to select max or min items I use cast: e.g

I have exactly: 2310556   rows in the table

1. all CAST
select * from "SOME_table" where cast(UTCTimestamp AS INTEGER) = (select 
max(cast(UTCTimestamp AS INTEGER)) from "SOME_table")


takes about 8436  msec


2. Less CAST
select * from "SOME_table" where UTCTimestamp= (select 
max(cast(UTCTimestamp AS INTEGER)) from "SOME_table")


takes about 2823 msec

3. NO CAST
select * from "SOME_table" where UTCTimestamp= (select max(UTCTimestamp) 
from "SOME_table")


takes about 1 msec


in this particular case it gives me back all the same results but in 
other cases I need the Cast version.


NOW is there a way to speed this up except inserting the data as numeric.


Thanks W.Braun



for my first discussion on way I need text entry see topic: Is there a 
way to do comparison of text in a numerical way.




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] PATCH: WHERE clause OR to UNION optimization

2007-12-22 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote:
> The attached patch implements the WHERE clause "OR to UNION" 
> optimization as described in this post:
> 

I just went thumbing through the firesafe and I do not think
I have a copyright release on file for you, Joe.  Please go
print out a copy of one of

   http://www.sqlite.org/copyright-release.html
   http://www.sqlite.org/copyright-release.pdf

Sign it, get your employer to sign it, then send it to me
by post to the address given at the bottom of

   http://www.sqlite.org/copyright.html

Thanks.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Trac Account

2007-12-22 Thread drh
"Shawn Wilsher" <[EMAIL PROTECTED]> wrote:
> I believe that I can get e-mail
> notifications of changes to tickets, which is a heck of a lot better
> than bookmarking a ticket and checking back every few days.
> 

CVSTrac (http://www.cvstrac.org/) does not support email notification
of ticket chagnes.  But you can use RSS on the timeline to accomplish 
much the same thing.  http://www.sqlite.org/cvstrac/timeline.rss

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-