Robert,

> > time sqlite3 db 'select n1 from data where a <= 18234721' > /dev/null
> > 26.15u 0.59s 0:27.00 99.0%
> >
> > time sqlite3 db 'select n1 from data where a <= 18234721' > /dev/null
> > 26.04u 0.61s 0:26.91 99.0%
> >
> > time sqlite3 db 'select e from data where a <= 18234721' > /dev/null
> > 12.22u 0.41s 0:12.67 99.6%
> >
> > time sqlite3 db 'select e from data where a <= 18234721' > /dev/null
> > 12.01u 0.61s 0:12.64 99.8%
> >
> > so just by selecting two different rows (both next to each other). I
> > get another 50% time difference? sure the column types
> > are different,
> > and i can even understand that FLOAT might be 8 bytes and
> > INTEGER 4
> > bytes, but 50% time difference? it just cant be that
> > linear can it?
> 
> By doing these same two queries using select count(), you've proven my
> original theory that the time difference is due to the volume of bytes being
> transmitted to dev/null and NOT because of the actual data types of the
> columns or ANY OTHER FACTOR.  If you really want, change your pipe output to

ok, i agree, redirecting output to /dev/null might have impact on times. 

> two file names, I'm 100% sure you'll find that the 'select n1 ...' query
> results in an output file significantly larger than the 'select e ...'
> output file.  This is where the differing performance comes from -- the time
> it is taking to parse and print your output.

here you are wrong actually. 

> time sqlite3 db 'select e from data where a <= 18234721' > x
12.01u 0.64s 0:12.80 98.8%

> time sqlite3 db 'select n1 from data where a <= 18234721' > y
26.06u 0.62s 0:26.86 99.3%

the size of x is 1070681 (output of column e) and the size of y is
1004219 (output of column n1), so the file sizes are about the same. x
is 66462 bytes more.  it is probably possible that printf() is 2 times
slower on float's.

> The select count(*) does indeed use the index and hence the significant
> performance difference.

fine

> > fine, if i ask sqlite just to count the rows it wins hands-down, but i
> > really want these rows. even more i 'd like to then "natural join"
> > these rows with a couple of other tables to really do what the perl
> > code currently does.
> >
> > but, it takes 22 seconds to just to create a temp table with the
> > required dataset
> >
> > > time sqlite3 db 'create temp table foo as select * from
> > data where a <= 18234721' > /dev/null
> > 21.93u 0.89s 0:22.95 99.4%
> >
> > and i do not understand what i'm doing wrong here :(
> 
> Again, the only reason I suggested using count() in your timing test was to
> ensure that the command-line sqlite3 program's output was consistent for
> both tests and to eliminate dev/null printf's from factoring into the total
> time.  In your application, you'll call select * (or whatever) without the
> count to retrieve the rows -- but since you're not printf'ing them and
> instead are doing your own thing with them, you will indeed see close to
> identical times in your selects just like you did in the count() test.

i said i print these rows to /dev/null too in my perl code. plus the
perl code does some other things such as joining these rows with other
hashes and summing the numbers.

> As for the temp table ... I haven't tried this, but isn't "temp" a reserved
> word in SQLite?  More importantly, you should be doing this statement inside

yes, it is. i really want to create 'temporary table' in memory. i was
really hoping it would speed things up.

> a transaction.  Transactions are critically important in SQLite.  I would go
> so far as to say NEVER EVER perform ANY bulk write in SQLite outside a
> transaction!  The performance difference is beyond phenomenal.

nope.

> time sqlite3 db 'begin; create temp table foo as select * from data where a 
> <= 18234721; commit' > /dev/null
21.90u 0.77s 0:22.87 99.1%

still 22 seconds to just create a table with 300,000+ records, and
that is, unfortunately, too slow :(

doing strace/truss on sqlite shows that it performs huge amount of seek's. 

so the original questions stay:

- what am i doing wrong here?

- is sqlite going to be not as fast on a fairly large index'ed table
because it has to seek back and forth between index and data?

thanks,
max

Reply via email to