Puneet,

As you suggested I have supplied a brief background re: the problem:

Background:
I'm very new to sql (x2 weeks). I have a database with two tables one  
with -say (max)- 12k rows of data, and the other with more. The first  
table (lets calls it A) has the following columns:

source_id, x_pos, y_pos, magnitude, time_stamp, bin_num

(source_id) is unique and I have indexes on all columns - individual  
indexes.

The second table (let's call it B) has the following columns:

source_id, time_stamp, bin_num

No column is unique and I have indexes on all columns - individual  
indexes.

I create/update the database without a problem using, sql_prepare,  
sql_bind and sql_step. I use begin/commit to bundle transactions for  
the updates. The updating has decent timing.

I query the database with a query to extract x_pos, y_pos from table A  
for instances (rows) that match a particular bin_num(s) in B provided  
the source_ids are the same in both tables. The query take ~30.00  
seconds when run about 7k times  in a loop. Each select statement is  
distinct. The timing isn't acceptable. Obviously the query is  
inefficient and/or the database isn't organized optimally etc. etc.  
The program is part of data reduction pipeline system for an astronomy  
project. I use the C-interface to sqlite3.

Here's the query:

select * from A a, B b where b.bin=? and a.soruce_id=b.source_id
or
elect * from A a, B b where b.bin in (?, ?, ?, ?, ?)  and  
a.soruce_id=b.source_id

the "?" is filled in by a bind via values calculated at run time.  
Similar to the updates
I use sql_prepare, sql_bind and sql_step to run the query. I then loop  
through the resulting
rows I retrieve from the database with sqlite3_column_* . Doing this  
say 7k times for each
run of the program (and then repeatedly in the pipeline) is hugely  
costly.  How can I optimize
my query/database for better performance?

Sample data (table A)
            source_id               x_pos                
y_pos                 mag                band        
fr_time_stamp         pix_bin_num
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
========================================================================
   50275a003-000002-3             382.836              
235.303               6.162                   3           
1260978065                  23
   50275a003-000003-3             166.883               
567.99               6.032                   3           
1260978065                  51
   50275a003-000004-3            1009.492                
753.4               6.243                   3           
1260978065                  80
   50275a003-000005-3              10.083              
153.815               7.672                   3           
1260978065                  10
   50275a003-000006-3             332.153               
411.88                7.65                   3           
1260978065                  43
   50275a003-000007-3             888.086              
135.478               7.589                   3           
1260978065                  18
   50275a003-000009-3             208.277              
292.152               8.127                   3           
1260978065                  22
   50275a003-000013-3             788.648              
829.213               8.424                   3           
1260978065                  87
   50275a003-000014-3             277.768               
19.981               8.335                   3           
1260978065                   2
   50275a003-000017-3             665.116              
624.767               8.807                   3           
1260978065                  66
   50275a003-000018-3             170.859              
855.147               8.734                   3           
1260978065                  81
   50275a003-000019-3             694.634              
210.285               8.787                   3           
1260978065                  26
   50275a003-000020-3             293.737               
11.928               9.144                   3           
1260978065                   2
   50275a003-000023-3              311.53              
729.644               9.237                   3           
1260978065                  73
   50275a003-000024-3             284.052              
947.095               9.632                   3           
1260978065                  92



Sample data (table B)
         pix_bin_num           source_id       fr_time_stamp
============================================================
                   21  50275a003-000002-3          1260978065
                   11  50275a003-000002-3          1260978065
                   31  50275a003-000002-3          1260978065
                   12  50275a003-000002-3          1260978065
                   22  50275a003-000002-3          1260978065
                   32  50275a003-000002-3          1260978065
                    2  50275a003-000002-3          1260978065
                   42  50275a003-000002-3          1260978065
                    3  50275a003-000002-3          1260978065
                  42  50275a003-007106-3          1260978065
                    0  50275a003-000002-4          1260978065
                    5  50275a003-000002-4          1260978065
                   10  50275a003-000002-4          1260978065
                    1  50275a003-000002-4          1260978065
                    6  50275a003-000002-4          1260978065
                   11  50275a003-000002-4          1260978065
                   16  50275a003-000002-4          1260978065
                    2  50275a003-000002-4          1260978065
                    7  50275a003-000002-4          1260978065
                   12  50275a003-000002-4          1260978065
                   17  50275a003-000002-4          1260978065


Result from "explain query plan"
sqlite> explain query plan select * from latent_parents a, pix_bins b  
where b.pix_bin_num=0 and a.source_id=b.source_id;
0|1|TABLE pix_bins AS b WITH INDEX pix_bin_num_index_pix_bin_tbl
1|0|TABLE latent_parents AS a WITH INDEX source_id_index_lp_tbl

Many thanks,
rosemary.

On Apr 7, 2009, at 1:57 PM, P Kishor wrote:

> On Tue, Apr 7, 2009 at 3:45 PM, Rosemary Alles  
> <al...@ipac.caltech.edu> wrote:
>> Hullo Puneet,
>>
>> Many thanks for your response.
>>
>> My understanding of a sqlite3 "transaction" is probably poor. From  
>> your
>> response
>> (if you are correct) I see that only UPDATES and INSERTS can be  
>> speeded up
>> via bundling many numbers of them in a Being/Commit block?
>
> Not that it is any standard, but search for the word "transaction" at
>
> http://developer.postgresql.org/pgdocs/postgres/sql-select.html
>
> You will see, Pg recommends using SELECTs inside a TRANSACTION for
> just the reason I mentioned in my email... ensuring that you retrieve
> something dependable that is not changed on you midstream, not for
> speed.
>
>> Leading me to
>> ask:
>> Is there no difference in behavior between a SINGLE select and  
>> several
>> of them within the context of transaction?
>
> What do you mean by "behavior"? Do you mean what you will get back?
> No, it shouldn't be different. Do you mean how fast you will get it
> back? Dunno, but you can tell for sure by writing a trivial
> benchmarking script on your data.
>
>
>>
>> And yes, each of the many SELECTS have a different WHERE clause.
>
> Don't mean to preempt your application, but bunching SELECTs with
> different WHERE clause makes little sense. I mean, if you are doing
>
> SELECT .. FROM .. WHERE color = 'blue'
> SELECT .. FROM .. WHERE color = 'red'
> SELECT .. FROM .. WHERE color = 'green'
>
> you can just as well do
>
> SELECT .. FROM .. WHERE color IN ('blue','red','green')
>
> On the other hand, if you are doing
>
> SELECT .. FROM .. WHERE color = 'blue'
> SELECT .. FROM .. WHERE taste = 'bitter'
> SELECT .. FROM .. WHERE type = 'pill'
>
> That doesn't make much sense, but can also be accomplished with a
> single SELECT and a bunch of ORs
>
> Maybe you should explain your actual problem a bit more. What exactly
> are you trying to accomplish? What does your db look like? Provide
> some sample data, and perhaps example of your multiple but different
> SELECT queries that you want to wrap in a transaction.
>
> Once again, if only speed is your aim, benchmark it.
>
>
>>
>> -rosemary.
>>
>> On Apr 7, 2009, at 12:38 PM, P Kishor wrote:
>>
>>> On Tue, Apr 7, 2009 at 2:36 PM, Rosemary Alles <al...@ipac.caltech.edu 
>>> >
>>> wrote:
>>>>
>>>> Hullo Igor,
>>>>
>>>> Many thanks for your response: I believe I didn't phrase my  
>>>> question
>>>> correctly:
>>>>
>>>> 1) If I were to bundle several thousand SELECT statements in a  
>>>> single
>>>> transaction - why would it not run faster?
>>>
>>> as far as I understand, transactions matter only in the context of
>>> UPDATEs and INSERTs for the purpose of speed (keep in mind, I am not
>>> addressing data integrity -- the ACID part here, but only speed). A
>>> transaction speeds this multiple UPDATEs and INSERTs by decreasing  
>>> the
>>> number of times your program interacts with slowest part of your
>>> computer, the hard disk.
>>>
>>> Multiple SELECTs in a transaction might help with the integrity, but
>>> ensuring that you don't end up getting data changed in mid-stream,  
>>> but
>>> won't speed up the query.
>>>
>>> Are all your thousands of SELECTs based on different WHERE  
>>> criterion?
>>> If not, they would really be just one SELECT.
>>>
>>>> 2) This is precisely the problem though - each of those statements
>>>> will yield rows of results to be parsed with
>>>> sqlite3_column - in the context of the user's (my) program. If many
>>>> SELECT statements are issued within the context
>>>> of a single transaction (repeatedly), how does one deal with the
>>>> results without a callback (if using sql_step)? Yes,
>>>> sql_exec is touted to be a wrapper around sql_prepare, bind, step.
>>>> However, is does (also - additionally) offer the
>>>> option of a user supplied calleback routine which sql_prepare  
>>>> etc. do
>>>> not.
>>>>
>>>> Essentially, my question is about context. if many many SELECTS are
>>>> bundled in a single transaction using prepare,
>>>> bind and step. In what context does one parse the results? Do we  
>>>> not
>>>> have synchronizing issue here?
>>>>
>>>> Thanks again,
>>>> rosemary
>>>>
>>>> On Apr 6, 2009, at 8:03 PM, Igor Tandetnik wrote:
>>>>
>>>>> "Rosemary Alles" <al...@ipac.caltech.edu> wrote
>>>>> in message news:20a6b796-613b-4f5d-bfca-359d6b9fa...@ipac.caltech.edu
>>>>>>
>>>>>> I want to speed up my app. Can I run SELECT statements within the
>>>>>> context of a transaction.
>>>>>
>>>>> Yes, but it's unlikely to make it run any faster.
>>>>>
>>>>>> If so, how does one handle the query
>>>>>> results?
>>>>>
>>>>> The same way as when running it without an explicit transaction.
>>>>>
>>>>>> I would assume this cannot be done with sql_prepare,
>>>>>> sql_bind, sql_step?
>>>>>
>>>>> Of course it can. See sqlite3_column_*
>>>>>
>>>>>> Would I *have* to use sql_exec
>>>>>
>>>>> No. And if you look at the implementation of sqlite3_exec, it uses
>>>>> sqlite3_prepare and sqlite3_step internally anyway. It's  
>>>>> maintained
>>>>> mostly for backward compatibility.
>>>>>
>>>>>> What am I giving up
>>>>>> by using sql_exec vs sql_prepare, sql_bind and sql_step?
>>>>>
>>>>> Off the top of my head: 1) strong typing (you get all data as  
>>>>> strings,
>>>>> so that, say, an integer is converted to string and then you'll  
>>>>> have
>>>>> to
>>>>> convert it back); 2) streaming (with sqlite3_exec, the whole  
>>>>> resultset
>>>>> must be present in memory at the same time; with sqlite3_step,  
>>>>> only
>>>>> one
>>>>> row's worth of data needs to be present in memory; makes a huge
>>>>> difference for very large resultsets).
>>>>>
>>>>> Igor Tandetnik
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Puneet Kishor http://www.punkish.org/
>>> Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
>>> Carbon Model http://carbonmodel.org/
>>> Open Source Geospatial Foundation http://www.osgeo.org/
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to