Robert, > > time sqlite3 db 'select n1 from data where a <= 18234721' > /dev/null > > 26.15u 0.59s 0:27.00 99.0% > > > > time sqlite3 db 'select n1 from data where a <= 18234721' > /dev/null > > 26.04u 0.61s 0:26.91 99.0% > > > > time sqlite3 db 'select e from data where a <= 18234721' > /dev/null > > 12.22u 0.41s 0:12.67 99.6% > > > > time sqlite3 db 'select e from data where a <= 18234721' > /dev/null > > 12.01u 0.61s 0:12.64 99.8% > > > > so just by selecting two different rows (both next to each other). I > > get another 50% time difference? sure the column types > > are different, > > and i can even understand that FLOAT might be 8 bytes and > > INTEGER 4 > > bytes, but 50% time difference? it just cant be that > > linear can it? > > By doing these same two queries using select count(), you've proven my > original theory that the time difference is due to the volume of bytes being > transmitted to dev/null and NOT because of the actual data types of the > columns or ANY OTHER FACTOR. If you really want, change your pipe output to
ok, i agree, redirecting output to /dev/null might have impact on times. > two file names, I'm 100% sure you'll find that the 'select n1 ...' query > results in an output file significantly larger than the 'select e ...' > output file. This is where the differing performance comes from -- the time > it is taking to parse and print your output. here you are wrong actually. > time sqlite3 db 'select e from data where a <= 18234721' > x 12.01u 0.64s 0:12.80 98.8% > time sqlite3 db 'select n1 from data where a <= 18234721' > y 26.06u 0.62s 0:26.86 99.3% the size of x is 1070681 (output of column e) and the size of y is 1004219 (output of column n1), so the file sizes are about the same. x is 66462 bytes more. it is probably possible that printf() is 2 times slower on float's. > The select count(*) does indeed use the index and hence the significant > performance difference. fine > > fine, if i ask sqlite just to count the rows it wins hands-down, but i > > really want these rows. even more i 'd like to then "natural join" > > these rows with a couple of other tables to really do what the perl > > code currently does. > > > > but, it takes 22 seconds to just to create a temp table with the > > required dataset > > > > > time sqlite3 db 'create temp table foo as select * from > > data where a <= 18234721' > /dev/null > > 21.93u 0.89s 0:22.95 99.4% > > > > and i do not understand what i'm doing wrong here :( > > Again, the only reason I suggested using count() in your timing test was to > ensure that the command-line sqlite3 program's output was consistent for > both tests and to eliminate dev/null printf's from factoring into the total > time. In your application, you'll call select * (or whatever) without the > count to retrieve the rows -- but since you're not printf'ing them and > instead are doing your own thing with them, you will indeed see close to > identical times in your selects just like you did in the count() test. i said i print these rows to /dev/null too in my perl code. plus the perl code does some other things such as joining these rows with other hashes and summing the numbers. > As for the temp table ... I haven't tried this, but isn't "temp" a reserved > word in SQLite? More importantly, you should be doing this statement inside yes, it is. i really want to create 'temporary table' in memory. i was really hoping it would speed things up. > a transaction. Transactions are critically important in SQLite. I would go > so far as to say NEVER EVER perform ANY bulk write in SQLite outside a > transaction! The performance difference is beyond phenomenal. nope. > time sqlite3 db 'begin; create temp table foo as select * from data where a > <= 18234721; commit' > /dev/null 21.90u 0.77s 0:22.87 99.1% still 22 seconds to just create a table with 300,000+ records, and that is, unfortunately, too slow :( doing strace/truss on sqlite shows that it performs huge amount of seek's. so the original questions stay: - what am i doing wrong here? - is sqlite going to be not as fast on a fairly large index'ed table because it has to seek back and forth between index and data? thanks, max