Hi all-

   Thanks for your many suggestions.  I have tried many of your suggestions,
and found the following:


   1. If I create the database with depth ordered from least to greatest,
   and then create an index, the searches run an order of magnitude faster than
   without an index.  I.e. the search for matching entries in the data base
   concludes in about 9 seconds, instead of the 90 seconds it had before.
   Thanks to "developir" for this suggestion!
   2. searching via "rowid" makes no difference.
   3. doing vacuum makes no difference, since this is a write once, read
   many database.
   4. ulimit on my machine is unlimited, so this was not important for me.
   5. Cache size seems to make no difference -- and the SQLite documentation
   says it should only be important for DELETE's and UPDATE's.
   6. I like the idea of using RTREE, however, I don't want each of the
   students who use this database to have to recompile the version of SQLite
   that comes with their version of python.

Several of you, in particular Ken, suggested that I run SQL queries that
would provide some output that would be helpful in figuring out what was
going on.    The database is now more than fast enough for me, so I would
not do this on my own.  However, if others on the mailing list would like to
see this output in order to make SQLite better (or just for their own
curiosity), please feel free to let me know, and I can make the runs.

I have a small subset of the data (450Mb) which exhibits the same
behaviour.  The data is public (it is from the National Ocean Database), and
so if anyone wants to see it I would be happy to put it on my web server.

Cheers,
and thanks to everyone who helped me!
Jamie Pringle

On Wed, Oct 8, 2008 at 6:50 PM, James Pringle <[EMAIL PROTECTED]> wrote:
> Hi-
>
>  I am new to sqlite, and am having a puzzling problem.  I have read
> that adding an INDEX to a sqlite table can never make a SELECT
> statement slower.  However, in my case, it seems to be making the
> select statement an order of magnitude slower, which is not what I
> wanted!  What could be going on?
>
>  I am calling SQLite from python 2.5 from fink useing the sqlite3
> module on OS X 10.5 with 2Gb of memory.  My data base contains 32
> million lines of ocean hydrographic data, stored on disk, with the
> table created with the following statement:
>
>    CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER,
>    month INTEGER, day INTEGER, time REAL, cast_id INTEGER,
>    depth REAL, T REAL, S REAL, water_depth REAL)
>
> When I perform a SELECT that returns about 0.6% of the data in the
> database (about 200,000 records)
>
>    SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <=
50.99
>
> It selects and returns the records in 82 seconds.  I wished to improve
> on this time, so I created an index with the following command:
>
>    CREATE INDEX hydro_indx ON hydro (depth)
>
> I then tried the SELECT command again, and read the results into
> memory, and it took 717 seconds!?!
>
> The "depth" field contains many distinct numeric values, with no
> specific value matching more then 0.5% of the data base.  When I DROP
> the index with the command
>
>    DROP INDEX hydro_indx
>
> The SELECT time returns to about 80 seconds, confirming that it is the
> index which is slowing things down.  What is going on?  I have
> repeated and confirmed these timings.
>
> I have listened for disk chatter and monitored the system, and it does
> not seem to be thrashing swap, or otherwise becoming unresponsive.
>
> I have two questions:
>
>    1) Why is the index making things slower?
>    2) How can I make my SELECT statement faster?  The primary
> selection will be done
>       on the "depth" and "water_depth" keys.
>
> I thank you for your time.
>
> Cheers,
> Jamie Pringle
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to