Found the bottleneck! First, the results with the "correction" --

Db remains on disk:
-----------------------------
[06:01 PM] ~/Data/carbonmodel$perl carbonmodel.pl
Prepare load testing
timethis 1000:  3 wallclock secs ( 0.23 usr +  0.41 sys =  0.64 CPU) @
1562.50/s (n=1000)

or 0.63 ms per SELECT

Db is transferred into RAM:
---------------------------------------
[05:51 PM] ~/Data/carbonmodel$perl carbonmodel.pl
Creating in memory tables... done.
Transferring data to memory... done. Took: 90 wallclock secs (76.05
usr +  8.33 sys = 84.38 CPU)
Creating indexes... done. Took: 36 wallclock secs (23.98 usr + 11.29
sys = 35.27 CPU)
Prepare load testing
timethis 1000:  0 wallclock secs ( 0.07 usr +  0.00 sys =  0.07 CPU) @
14285.71/s (n=1000)
            (warning: too few iterations for a reliable count)

or 0.07 ms per SELECT.

====
Ok. So what is going on. Well, the most excellent Devel::NYTProf (if
you haven't seen it in action, you have no idea what you are missing)
showed that retrieving the weather data was chewing up all the time.
See

http://carbonmodel.org/nytprof/carbonmodel.pl-sub.html#901

Taking that out causes the SELECTs to jump up.

Of course, that is not a solution, because I do need the weather data,
but it is a lot of it... basically, each cell has 7300 rows by 9
columns of data (all INTEGER and REAL).

The query I have for retrieving the weather data is

SELECT year, y_day, t_max, t_min, t_ave, precip, vpd, s_rad, day_len
FROM temporal
WHERE met_id IN (:list_of_met_ids)

There is an index on met_id, but whether I have

WHERE met_id IN (3)

or

WHERE met_id = 3

I get the same result... about 28-30 ms retrieving all 20 years worth
of daily weather data. So, now I can focus my energy on streamlining
that particular part of the problem. Maybe I have to simply rethink my
table. As is, the weather table looks like so

table: temporal
----------------------
temporal_id INTEGER PRIMARY KEY
bunch of other columns
met_id INTEGER

There are 400 sets of 7300 rows (hence, 400 unique met_ids in 2.92
million rows). Each cell is related to one and only one met_id, while
each met_id, of course, can refer to many cells. Once I retrieve a
cell, I know its met_id from the cells table. Then I can look up in
the temporal table for the 7300 rows that belong to that cell.

Of course, I could have 400 different temporal tables, but I would
still need to retrieve those. Might be worth a shot to see if that
helps, but, oh, how messy.

But, read on for specific responses to Stefan's post...


On Sat, Mar 21, 2009 at 5:42 PM, Stefan Evert <stefan.ev...@uos.de> wrote:
>
> On 21 Mar 2009, at 15:31, P Kishor wrote:
>
>> I did some benchmarking with the above schema using Perl DBI, and I
>> get about 30 transactions per second as long as I returning the data
>> to memory.
>
> Even for Perl/DBI, that seems pretty slow.  Depends on how much data each of
> these transactions returns, though -- if there are thousands of rows in lc
> or dist for each cell_id, then you can't expect much better performance.
>  Even though DBI and DBD::SQLite are written in C, they have to allocate
> fairly complex data structures to return the data (in the best case, an
> anonymous array with 40 to 60 entries for each data row in the result set),
> and these operations are relatively expensive in Perl (I know because I've
> written some XS code recently that does this kind of thing).
>

Stefan, you were right about "Depends on how much data" part, but it
applied to the weather data, not the lookup tables for lc (or
landcover) or dist (or disturbance).

Also, "Even for Perl/DBI" makes it seem there is something wrong with
Perl/DBI... no, no... Perl/DBI is an industry-tested, extremely
capable interface. I wouldn't trade it for anything. The problem was
with my problem, with my data size, with my approach. I have to
correct that. DBD::SQLite/DBD::SQLite::Amalgamation/Perl DBI are the
greatest things since SQLite.

> Another thing to keep in mind is that the SQLite version included in the
> DBD::SQLite distribution is fairly old (3.4.0 on my Mac -- I doubt there's a
> more recent version of DBD::SQLite around), and AFAIK there have been some
> speed improvements in SQLite recently.

I am using DBD::SQLite::Amalgamation with SQLite 3.6.11. I am not sure
there is anything wrong with DBD::SQLite 1.1.4 at all (other than its
use of the older SQLite code, which is easily corrected). One might
get better bang by ensuring that the latest version of DBI is
installed, which Tim Bunce and company are probably always
fine-tuning.



>
> (Darren, any news from the maintainer of DBD::SQLite?  I would be very
> delighted and grateful to be able to use an up-to-date SQLite version in my
> Perl scripts.)
>
>>
>> [1] First retrieve all data from cell table
>> SELECT * FROM cell WHERE cell_id = :cell_id
>>
>> [2] Now retrieve the related lc, dist and met
>> SELECT lc.*
>> FROM lc l JOIN cell_lc c on l.lc_id = c.lc_id
>> WHERE c.cell_id = :cell_id
>
> Just a short in the dark, but people on this list have occasionally pointed
> out that SQLite isn't very good at optimising JOIN queries, so it's often
> better to use a sub-select clause.  (SQLite isn't very good at optimisation
> in general, at least the version 3.4.0 that comes with Perl, and I've found
> that sometimes it helps a lot to give a few subtle hints to the optimiser
> ...)
>
> Have you tried rephrasing queries [2] and [3] in this form?
>
>> SELECT * FROM lc WHERE lc_id IN (SELECT lc_id FROM cell_lc WHERE cell_id =
>> :cell_id)
>
> I've had good experiences with this approach, although my definition of good
> performance is rather along the lines of "completes within less than 5
> seconds". :-)
>
>
> Best regards,
> Stefan Evert
>
> [ stefan.ev...@uos.de | http://purl.org/stefan.evert ]
>
>
>
>



-- 
Puneet Kishor http://www.punkish.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Carbon Model http://carbonmodel.org/
Open Source Geospatial Foundation http://www.osgeo.org/
Sent from: Madison WI United States.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to