Let's recap ...

> time sqlite3 db 'select n1 from data where a <= 18234721' > /dev/null
> 26.15u 0.59s 0:27.00 99.0%
>
> time sqlite3 db 'select n1 from data where a <= 18234721' > /dev/null
> 26.04u 0.61s 0:26.91 99.0%
>
> time sqlite3 db 'select e from data where a <= 18234721' > /dev/null
> 12.22u 0.41s 0:12.67 99.6%
>
> time sqlite3 db 'select e from data where a <= 18234721' > /dev/null
> 12.01u 0.61s 0:12.64 99.8%
>
> so just by selecting two different rows (both next to each other). I
> get another 50% time difference? sure the column types 
> are different,
> and i can even understand that FLOAT might be 8 bytes and 
> INTEGER 4
> bytes, but 50% time difference? it just cant be that 
> linear can it?

By doing these same two queries using select count(), you've proven my
original theory that the time difference is due to the volume of bytes being
transmitted to dev/null and NOT because of the actual data types of the
columns or ANY OTHER FACTOR.  If you really want, change your pipe output to
two file names, I'm 100% sure you'll find that the 'select n1 ...' query
results in an output file significantly larger than the 'select e ...'
output file.  This is where the differing performance comes from -- the time
it is taking to parse and print your output.

The select count(*) does indeed use the index and hence the significant
performance difference.  

> fine, if i ask sqlite just to count the rows it wins hands-down, but i
> really want these rows. even more i 'd like to then "natural join"
> these rows with a couple of other tables to really do what the perl
> code currently does.
> 
> but, it takes 22 seconds to just to create a temp table with the
> required dataset
> 
> > time sqlite3 db 'create temp table foo as select * from 
> data where a <= 18234721' > /dev/null
> 21.93u 0.89s 0:22.95 99.4%
> 
> and i do not understand what i'm doing wrong here :(

Again, the only reason I suggested using count() in your timing test was to
ensure that the command-line sqlite3 program's output was consistent for
both tests and to eliminate dev/null printf's from factoring into the total
time.  In your application, you'll call select * (or whatever) without the
count to retrieve the rows -- but since you're not printf'ing them and
instead are doing your own thing with them, you will indeed see close to
identical times in your selects just like you did in the count() test.

As for the temp table ... I haven't tried this, but isn't "temp" a reserved
word in SQLite?  More importantly, you should be doing this statement inside
a transaction.  Transactions are critically important in SQLite.  I would go
so far as to say NEVER EVER perform ANY bulk write in SQLite outside a
transaction!  The performance difference is beyond phenomenal.

Robert


Reply via email to