Gé, thanks for the suggestion. unfortunately it did not make any difference :( below is the results. as you can see it takes 7+ seconds to "group by" 333,392 records and i'm grouping by column on which i have index. again, i'm not a database guy, but i think that is slow. perhaps someone can comment if that is the best sqlite can do?
sqlite> .schema data CREATE TABLE data( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a on data (a); sqlite> select count(*) from data; 333392 test-1.sql ------------ PRAGMA cache_size = 300000; PRAGMA cache_size; PRAGMA page_size; SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a); test-2.sql ------------ PRAGMA cache_size; PRAGMA page_size; SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a); > time sqlite3 db < test-1.sql 300000 1024 1417 6.89u 0.33s 0:07.55 95.6% > time sqlite3 db < test-2.sql 2000 1024 1417 6.91u 0.19s 0:07.39 96.0% > time sqlite3 db2048 < test-1.sql 300000 2048 1417 6.80u 0.08s 0:07.32 93.9% > time sqlite3 db2048 < test-2.sql 2000 2048 1417 6.77u 0.12s 0:07.10 97.0% > time sqlite3 db4096 < test-1.sql 300000 4096 1417 6.80u 0.15s 0:07.21 96.3% > time sqlite3 db4096 < test-2.sql 2000 4096 1417 6.79u 0.15s 0:07.15 97.0% > time sqlite3 db8192 < test-1.sql 300000 8192 1417 6.70u 0.11s 0:07.01 97.1% > time sqlite3 db8192 < test-2.sql 2000 8192 1417 6.73u 0.09s 0:07.01 97.2% thanks, max On Apr 12, 2005 7:10 AM, Gé Weijers <[EMAIL PROTECTED]> wrote: > Maksim, > > Some things you could try: > > 1) increase cache memory > > You may be causing a lot of cache misses if the size of the query result > is very large compared to the size of the cache. Index-based searches > can cause multiple reloads of the same page because of a lack of > locality in the cache. An index-less search will just load each page once. > > as an experiment, try 'PRAGMA cache_size = 300000', before you run the > query. 1 GB of ram should be able to support 300MB of cache. > > 2) use 8192-byte pages > > Larger pages seem to improve performance quite a bit, in my experience. > > Do 'PRAGMA page_size = 8192' before you create the database. > > Doing both may cause excessive memory use (200000 * 8K = ...). I've > never tried that. > > > Gé > > > Maksim Yevmenkin wrote: > > >Robert, > > > > > > > >>[snip] > >> > >> > >> > >>>i said i print these rows to /dev/null too in my perl code. plus the > >>>perl code does some other things such as joining these rows with other > >>>hashes and summing the numbers. > >>> > >>> > >>That's fine. I was merely trying to account for the 50% speed difference > >>between the two differing column tests, which has been accomplished. > >> > >> > >> > >>>>As for the temp table ... I haven't tried this, but isn't > >>>> > >>>> > >>>"temp" a reserved > >>> > >>> > >>>>word in SQLite? More importantly, you should be doing this > >>>> > >>>> > >>>statement inside > >>> > >>>yes, it is. i really want to create 'temporary table' in memory. i was > >>>really hoping it would speed things up. > >>> > >>> > >>I misread the statement, so ignore me on that part. However, 339,000 rows > >>into a temporary in-memory table ... I tried some experiments locally here > >>and none of them took more than 2 seconds to execute. Are you sure you're > >>not using up all available memory, which is causing the system to hit the > >>swapfile? What does this same query look like when you drop the "temp" from > >>the query? > >> > >> > > > >the system has 1G of ram. i was "monitoring" sqlite3 memory usage with > >'top'. the SIZE and RES did not exceed 30M. so i do not think the > >memory is the issue here. > > > > > > > >>time sqlite3 db 'create table foo as select * from data where a <= > >>18234721' > /dev/null > >> > >> > >22.06u 1.39s 0:27.75 84.5% > > > >so pretty much the same time without 'temp'. > > > >i'm starting to suspect disk. here is what i did. i created a separate > >database with only one table. this table contains subset of 333392 > >rows from original data table. it also has the same index on "a" > >column, i.e. i did > > > > > > > >>sqlite3 db1 > >> > >> > >sqlite> attach db as s; > >sqlite> create table data as select * from s.data where a <= 18234721; > >sqlite> create index data_by_a on data (a); > > > >full scan > > > > > > > >>time sqlite3 db1 'select n1 from data' > /dev/null > >> > >> > >17.19u 0.55s 0:19.06 93.0% > > > >"bad" index scan, because it is guaranteed then the table only has > >keys that match "where" > > > > > > > >>time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null > >> > >> > >25.73u 0.59s 0:28.37 92.7% > > > >+10 seconds! is this the overhead of "indexed" scan? is this what it > >really takes to seek back and forth between index and data? what am i > >missing here? > > > >thanks, > >max > > > > > >