Thanks Puneet. Those suggestions really help.

-rosemary.


On Apr 7, 2009, at 5:52 PM, P Kishor wrote:

> On Tue, Apr 7, 2009 at 5:18 PM, Rosemary Alles  
> <al...@ipac.caltech.edu> wrote:
>> Puneet,
>>
>> As you suggested I have supplied a brief background re: the problem:
>>
>> Background:
>> I'm very new to sql (x2 weeks). I have a database with two tables one
>> with -say (max)- 12k rows of data, and the other with more. The first
>> table (lets calls it A) has the following columns:
>>
>> source_id, x_pos, y_pos, magnitude, time_stamp, bin_num
>>
>> (source_id) is unique and I have indexes on all columns - individual
>> indexes.
>>
>> The second table (let's call it B) has the following columns:
>>
>> source_id, time_stamp, bin_num
>>
>> No column is unique and I have indexes on all columns - individual
>> indexes.
>>
>> I create/update the database without a problem using, sql_prepare,
>> sql_bind and sql_step. I use begin/commit to bundle transactions for
>> the updates. The updating has decent timing.
>>
>> I query the database with a query to extract x_pos, y_pos from  
>> table A
>> for instances (rows) that match a particular bin_num(s) in B provided
>> the source_ids are the same in both tables. The query take ~30.00
>> seconds when run about 7k times  in a loop.
>
> So, each SELECT is taking about 4 ms (30,000/7000). You might try
> loading the entire db in memory and working with that. Your db is
> really tiny; at least one of the tables is only 12K rows. You don't
> say how big table b is, but if you can load the entire db in memory,
> you would get much faster times.
>
> Try other SELECTs as well...
>
> SELECT x_pos, y_pos
> FROM a
> WHERE source_id = (SELECT source_id FROM b WHERE bin_num = ?)
>
> or
>
> SELECT x_pos, y_pos
> FROM a
> WHERE source_id IN ( SELECT source_id FROM b WHERE bin_num IN  
> (?, ?, ?, ?) )
>
> Perhaps others on the list can suggest something.
>
>
>> Each select statement is
>> distinct. The timing isn't acceptable. Obviously the query is
>> inefficient and/or the database isn't organized optimally etc. etc.
>> The program is part of data reduction pipeline system for an  
>> astronomy
>> project. I use the C-interface to sqlite3.
>>
>> Here's the query:
>>
>> select * from A a, B b where b.bin=? and a.soruce_id=b.source_id
>> or
>> elect * from A a, B b where b.bin in (?, ?, ?, ?, ?)  and
>> a.soruce_id=b.source_id
>>
>> the "?" is filled in by a bind via values calculated at run time.
>> Similar to the updates
>> I use sql_prepare, sql_bind and sql_step to run the query. I then  
>> loop
>> through the resulting
>> rows I retrieve from the database with sqlite3_column_* . Doing this
>> say 7k times for each
>> run of the program (and then repeatedly in the pipeline) is hugely
>> costly.  How can I optimize
>> my query/database for better performance?
>>
>> Sample data (table A)
>>            source_id               x_pos
>> y_pos                 mag                band
>> fr_time_stamp         pix_bin_num
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> = 
>> = 
>> = 
>> =====================================================================
>>   50275a003-000002-3             382.836
>> 235.303               6.162                   3
>> 1260978065                  23
>>   50275a003-000003-3             166.883
>> 567.99               6.032                   3
>> 1260978065                  51
>>   50275a003-000004-3            1009.492
>> 753.4               6.243                   3
>> 1260978065                  80
>>   50275a003-000005-3              10.083
>> 153.815               7.672                   3
>> 1260978065                  10
>>   50275a003-000006-3             332.153
>> 411.88                7.65                   3
>> 1260978065                  43
>>   50275a003-000007-3             888.086
>> 135.478               7.589                   3
>> 1260978065                  18
>>   50275a003-000009-3             208.277
>> 292.152               8.127                   3
>> 1260978065                  22
>>   50275a003-000013-3             788.648
>> 829.213               8.424                   3
>> 1260978065                  87
>>   50275a003-000014-3             277.768
>> 19.981               8.335                   3
>> 1260978065                   2
>>   50275a003-000017-3             665.116
>> 624.767               8.807                   3
>> 1260978065                  66
>>   50275a003-000018-3             170.859
>> 855.147               8.734                   3
>> 1260978065                  81
>>   50275a003-000019-3             694.634
>> 210.285               8.787                   3
>> 1260978065                  26
>>   50275a003-000020-3             293.737
>> 11.928               9.144                   3
>> 1260978065                   2
>>   50275a003-000023-3              311.53
>> 729.644               9.237                   3
>> 1260978065                  73
>>   50275a003-000024-3             284.052
>> 947.095               9.632                   3
>> 1260978065                  92
>>
>>
>>
>> Sample data (table B)
>>         pix_bin_num           source_id       fr_time_stamp
>> ============================================================
>>                   21  50275a003-000002-3          1260978065
>>                   11  50275a003-000002-3          1260978065
>>                   31  50275a003-000002-3          1260978065
>>                   12  50275a003-000002-3          1260978065
>>                   22  50275a003-000002-3          1260978065
>>                   32  50275a003-000002-3          1260978065
>>                    2  50275a003-000002-3          1260978065
>>                   42  50275a003-000002-3          1260978065
>>                    3  50275a003-000002-3          1260978065
>>                  42  50275a003-007106-3          1260978065
>>                    0  50275a003-000002-4          1260978065
>>                    5  50275a003-000002-4          1260978065
>>                   10  50275a003-000002-4          1260978065
>>                    1  50275a003-000002-4          1260978065
>>                    6  50275a003-000002-4          1260978065
>>                   11  50275a003-000002-4          1260978065
>>                   16  50275a003-000002-4          1260978065
>>                    2  50275a003-000002-4          1260978065
>>                    7  50275a003-000002-4          1260978065
>>                   12  50275a003-000002-4          1260978065
>>                   17  50275a003-000002-4          1260978065
>>
>>
>> Result from "explain query plan"
>> sqlite> explain query plan select * from latent_parents a, pix_bins b
>> where b.pix_bin_num=0 and a.source_id=b.source_id;
>> 0|1|TABLE pix_bins AS b WITH INDEX pix_bin_num_index_pix_bin_tbl
>> 1|0|TABLE latent_parents AS a WITH INDEX source_id_index_lp_tbl
>>
>> Many thanks,
>> rosemary.
>>
>> On Apr 7, 2009, at 1:57 PM, P Kishor wrote:
>>
>>> On Tue, Apr 7, 2009 at 3:45 PM, Rosemary Alles
>>> <al...@ipac.caltech.edu> wrote:
>>>> Hullo Puneet,
>>>>
>>>> Many thanks for your response.
>>>>
>>>> My understanding of a sqlite3 "transaction" is probably poor. From
>>>> your
>>>> response
>>>> (if you are correct) I see that only UPDATES and INSERTS can be
>>>> speeded up
>>>> via bundling many numbers of them in a Being/Commit block?
>>>
>>> Not that it is any standard, but search for the word "transaction"  
>>> at
>>>
>>> http://developer.postgresql.org/pgdocs/postgres/sql-select.html
>>>
>>> You will see, Pg recommends using SELECTs inside a TRANSACTION for
>>> just the reason I mentioned in my email... ensuring that you  
>>> retrieve
>>> something dependable that is not changed on you midstream, not for
>>> speed.
>>>
>>>> Leading me to
>>>> ask:
>>>> Is there no difference in behavior between a SINGLE select and
>>>> several
>>>> of them within the context of transaction?
>>>
>>> What do you mean by "behavior"? Do you mean what you will get back?
>>> No, it shouldn't be different. Do you mean how fast you will get it
>>> back? Dunno, but you can tell for sure by writing a trivial
>>> benchmarking script on your data.
>>>
>>>
>>>>
>>>> And yes, each of the many SELECTS have a different WHERE clause.
>>>
>>> Don't mean to preempt your application, but bunching SELECTs with
>>> different WHERE clause makes little sense. I mean, if you are doing
>>>
>>> SELECT .. FROM .. WHERE color = 'blue'
>>> SELECT .. FROM .. WHERE color = 'red'
>>> SELECT .. FROM .. WHERE color = 'green'
>>>
>>> you can just as well do
>>>
>>> SELECT .. FROM .. WHERE color IN ('blue','red','green')
>>>
>>> On the other hand, if you are doing
>>>
>>> SELECT .. FROM .. WHERE color = 'blue'
>>> SELECT .. FROM .. WHERE taste = 'bitter'
>>> SELECT .. FROM .. WHERE type = 'pill'
>>>
>>> That doesn't make much sense, but can also be accomplished with a
>>> single SELECT and a bunch of ORs
>>>
>>> Maybe you should explain your actual problem a bit more. What  
>>> exactly
>>> are you trying to accomplish? What does your db look like? Provide
>>> some sample data, and perhaps example of your multiple but different
>>> SELECT queries that you want to wrap in a transaction.
>>>
>>> Once again, if only speed is your aim, benchmark it.
>>>
>>>
>>>>
>>>> -rosemary.
>>>>
>>>> On Apr 7, 2009, at 12:38 PM, P Kishor wrote:
>>>>
>>>>> On Tue, Apr 7, 2009 at 2:36 PM, Rosemary Alles <al...@ipac.caltech.edu
>>>>>>
>>>>> wrote:
>>>>>>
>>>>>> Hullo Igor,
>>>>>>
>>>>>> Many thanks for your response: I believe I didn't phrase my
>>>>>> question
>>>>>> correctly:
>>>>>>
>>>>>> 1) If I were to bundle several thousand SELECT statements in a
>>>>>> single
>>>>>> transaction - why would it not run faster?
>>>>>
>>>>> as far as I understand, transactions matter only in the context of
>>>>> UPDATEs and INSERTs for the purpose of speed (keep in mind, I am  
>>>>> not
>>>>> addressing data integrity -- the ACID part here, but only  
>>>>> speed). A
>>>>> transaction speeds this multiple UPDATEs and INSERTs by decreasing
>>>>> the
>>>>> number of times your program interacts with slowest part of your
>>>>> computer, the hard disk.
>>>>>
>>>>> Multiple SELECTs in a transaction might help with the integrity,  
>>>>> but
>>>>> ensuring that you don't end up getting data changed in mid-stream,
>>>>> but
>>>>> won't speed up the query.
>>>>>
>>>>> Are all your thousands of SELECTs based on different WHERE
>>>>> criterion?
>>>>> If not, they would really be just one SELECT.
>>>>>
>>>>>> 2) This is precisely the problem though - each of those  
>>>>>> statements
>>>>>> will yield rows of results to be parsed with
>>>>>> sqlite3_column - in the context of the user's (my) program. If  
>>>>>> many
>>>>>> SELECT statements are issued within the context
>>>>>> of a single transaction (repeatedly), how does one deal with the
>>>>>> results without a callback (if using sql_step)? Yes,
>>>>>> sql_exec is touted to be a wrapper around sql_prepare, bind,  
>>>>>> step.
>>>>>> However, is does (also - additionally) offer the
>>>>>> option of a user supplied calleback routine which sql_prepare
>>>>>> etc. do
>>>>>> not.
>>>>>>
>>>>>> Essentially, my question is about context. if many many SELECTS  
>>>>>> are
>>>>>> bundled in a single transaction using prepare,
>>>>>> bind and step. In what context does one parse the results? Do we
>>>>>> not
>>>>>> have synchronizing issue here?
>>>>>>
>>>>>> Thanks again,
>>>>>> rosemary
>>>>>>
>>>>>> On Apr 6, 2009, at 8:03 PM, Igor Tandetnik wrote:
>>>>>>
>>>>>>> "Rosemary Alles" <al...@ipac.caltech.edu> wrote
>>>>>>> in message news:20a6b796-613b-4f5d-bfca-359d6b9fa...@ipac.caltech.edu
>>>>>>>>
>>>>>>>> I want to speed up my app. Can I run SELECT statements within  
>>>>>>>> the
>>>>>>>> context of a transaction.
>>>>>>>
>>>>>>> Yes, but it's unlikely to make it run any faster.
>>>>>>>
>>>>>>>> If so, how does one handle the query
>>>>>>>> results?
>>>>>>>
>>>>>>> The same way as when running it without an explicit transaction.
>>>>>>>
>>>>>>>> I would assume this cannot be done with sql_prepare,
>>>>>>>> sql_bind, sql_step?
>>>>>>>
>>>>>>> Of course it can. See sqlite3_column_*
>>>>>>>
>>>>>>>> Would I *have* to use sql_exec
>>>>>>>
>>>>>>> No. And if you look at the implementation of sqlite3_exec, it  
>>>>>>> uses
>>>>>>> sqlite3_prepare and sqlite3_step internally anyway. It's
>>>>>>> maintained
>>>>>>> mostly for backward compatibility.
>>>>>>>
>>>>>>>> What am I giving up
>>>>>>>> by using sql_exec vs sql_prepare, sql_bind and sql_step?
>>>>>>>
>>>>>>> Off the top of my head: 1) strong typing (you get all data as
>>>>>>> strings,
>>>>>>> so that, say, an integer is converted to string and then you'll
>>>>>>> have
>>>>>>> to
>>>>>>> convert it back); 2) streaming (with sqlite3_exec, the whole
>>>>>>> resultset
>>>>>>> must be present in memory at the same time; with sqlite3_step,
>>>>>>> only
>>>>>>> one
>>>>>>> row's worth of data needs to be present in memory; makes a huge
>>>>>>> difference for very large resultsets).
>>>>>>>
>>>>>>> Igor Tandetnik
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>
>
>
> -- 
> Puneet Kishor http://www.punkish.org/
> Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
> Carbon Model http://carbonmodel.org/
> Open Source Geospatial Foundation http://www.osgeo.org/
> Sent from Madison, WI, United States
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to