Gé,

thanks for the suggestion. unfortunately it did not make any
difference :( below is the results. as you can see it takes 7+ seconds
to "group by"  333,392 records and i'm grouping by column on which i
have index. again, i'm not a database guy, but i think that is slow.
perhaps someone can comment if that is the best sqlite can do?


sqlite> .schema data
CREATE TABLE data(
  a INTEGER,
  b INTEGER,
  c CHAR,
  d INTEGER,
  e INTEGER,
  n1 FLOAT,
  n2 FLOAT
);
CREATE INDEX data_by_a on data (a);



sqlite> select count(*) from data;
333392



test-1.sql
------------
PRAGMA cache_size = 300000;
PRAGMA cache_size;
PRAGMA page_size;
SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);



test-2.sql
------------
PRAGMA cache_size;
PRAGMA page_size;
SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);



> time sqlite3 db < test-1.sql
300000
1024
1417
6.89u 0.33s 0:07.55 95.6%

> time sqlite3 db < test-2.sql
2000
1024
1417
6.91u 0.19s 0:07.39 96.0%



> time sqlite3 db2048 < test-1.sql
300000
2048
1417
6.80u 0.08s 0:07.32 93.9%

> time sqlite3 db2048 < test-2.sql
2000
2048
1417
6.77u 0.12s 0:07.10 97.0%



> time sqlite3 db4096 < test-1.sql
300000
4096
1417
6.80u 0.15s 0:07.21 96.3%

> time sqlite3 db4096 < test-2.sql
2000
4096
1417
6.79u 0.15s 0:07.15 97.0%



> time sqlite3 db8192 < test-1.sql
300000
8192
1417
6.70u 0.11s 0:07.01 97.1%

> time sqlite3 db8192 < test-2.sql
2000
8192
1417
6.73u 0.09s 0:07.01 97.2%

thanks,
max


On Apr 12, 2005 7:10 AM, Gé Weijers <[EMAIL PROTECTED]> wrote:
> Maksim,
> 
> Some things you could try:
> 
> 1) increase cache memory
> 
> You may be causing a lot of cache misses if the size of the query result
> is very large compared to the size of the cache. Index-based searches
> can cause multiple reloads of the same page because of a lack of
> locality in the cache. An index-less search will just load each page once.
> 
> as an experiment, try 'PRAGMA cache_size = 300000', before you run the
> query. 1 GB of ram should be able to support 300MB of cache.
> 
> 2) use 8192-byte pages
> 
> Larger pages seem to improve performance quite a bit, in my experience.
> 
> Do 'PRAGMA page_size = 8192' before you create the database.
> 
> Doing both may cause excessive memory use (200000 * 8K = ...). I've
> never tried that.
> 
> 
> Gé
> 
> 
> Maksim Yevmenkin wrote:
> 
> >Robert,
> >
> >
> >
> >>[snip]
> >>
> >>
> >>
> >>>i said i print these rows to /dev/null too in my perl code. plus the
> >>>perl code does some other things such as joining these rows with other
> >>>hashes and summing the numbers.
> >>>
> >>>
> >>That's fine.  I was merely trying to account for the 50% speed difference
> >>between the two differing column tests, which has been accomplished.
> >>
> >>
> >>
> >>>>As for the temp table ... I haven't tried this, but isn't
> >>>>
> >>>>
> >>>"temp" a reserved
> >>>
> >>>
> >>>>word in SQLite?  More importantly, you should be doing this
> >>>>
> >>>>
> >>>statement inside
> >>>
> >>>yes, it is. i really want to create 'temporary table' in memory. i was
> >>>really hoping it would speed things up.
> >>>
> >>>
> >>I misread the statement, so ignore me on that part.  However, 339,000 rows
> >>into a temporary in-memory table ... I tried some experiments locally here
> >>and none of them took more than 2 seconds to execute.  Are you sure you're
> >>not using up all available memory, which is causing the system to hit the
> >>swapfile?  What does this same query look like when you drop the "temp" from
> >>the query?
> >>
> >>
> >
> >the system has 1G of ram. i was "monitoring" sqlite3 memory usage with
> >'top'. the SIZE and RES did not exceed 30M. so i do not think the
> >memory is the issue here.
> >
> >
> >
> >>time sqlite3 db 'create table foo as select * from data where a <= 
> >>18234721' > /dev/null
> >>
> >>
> >22.06u 1.39s 0:27.75 84.5%
> >
> >so pretty much the same time without 'temp'.
> >
> >i'm starting to suspect disk. here is what i did. i created a separate
> >database with only one table. this table contains subset of 333392
> >rows from original data table. it also has the same index on "a"
> >column, i.e. i did
> >
> >
> >
> >>sqlite3 db1
> >>
> >>
> >sqlite> attach db as s;
> >sqlite> create table data as select * from s.data where a <= 18234721;
> >sqlite> create index data_by_a on data (a);
> >
> >full scan
> >
> >
> >
> >>time sqlite3 db1 'select n1 from data' > /dev/null
> >>
> >>
> >17.19u 0.55s 0:19.06 93.0%
> >
> >"bad" index scan, because it is guaranteed then the table only has
> >keys that match "where"
> >
> >
> >
> >>time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null
> >>
> >>
> >25.73u 0.59s 0:28.37 92.7%
> >
> >+10 seconds! is this the overhead of "indexed" scan? is this what it
> >really takes to seek back and forth between index and data? what am i
> >missing here?
> >
> >thanks,
> >max
> >
> >
> 
>

Reply via email to