Thanks Puneet. Those suggestions really help. -rosemary.
On Apr 7, 2009, at 5:52 PM, P Kishor wrote: > On Tue, Apr 7, 2009 at 5:18 PM, Rosemary Alles > <al...@ipac.caltech.edu> wrote: >> Puneet, >> >> As you suggested I have supplied a brief background re: the problem: >> >> Background: >> I'm very new to sql (x2 weeks). I have a database with two tables one >> with -say (max)- 12k rows of data, and the other with more. The first >> table (lets calls it A) has the following columns: >> >> source_id, x_pos, y_pos, magnitude, time_stamp, bin_num >> >> (source_id) is unique and I have indexes on all columns - individual >> indexes. >> >> The second table (let's call it B) has the following columns: >> >> source_id, time_stamp, bin_num >> >> No column is unique and I have indexes on all columns - individual >> indexes. >> >> I create/update the database without a problem using, sql_prepare, >> sql_bind and sql_step. I use begin/commit to bundle transactions for >> the updates. The updating has decent timing. >> >> I query the database with a query to extract x_pos, y_pos from >> table A >> for instances (rows) that match a particular bin_num(s) in B provided >> the source_ids are the same in both tables. The query take ~30.00 >> seconds when run about 7k times in a loop. > > So, each SELECT is taking about 4 ms (30,000/7000). You might try > loading the entire db in memory and working with that. Your db is > really tiny; at least one of the tables is only 12K rows. You don't > say how big table b is, but if you can load the entire db in memory, > you would get much faster times. > > Try other SELECTs as well... > > SELECT x_pos, y_pos > FROM a > WHERE source_id = (SELECT source_id FROM b WHERE bin_num = ?) > > or > > SELECT x_pos, y_pos > FROM a > WHERE source_id IN ( SELECT source_id FROM b WHERE bin_num IN > (?, ?, ?, ?) ) > > Perhaps others on the list can suggest something. > > >> Each select statement is >> distinct. The timing isn't acceptable. Obviously the query is >> inefficient and/or the database isn't organized optimally etc. etc. >> The program is part of data reduction pipeline system for an >> astronomy >> project. I use the C-interface to sqlite3. >> >> Here's the query: >> >> select * from A a, B b where b.bin=? and a.soruce_id=b.source_id >> or >> elect * from A a, B b where b.bin in (?, ?, ?, ?, ?) and >> a.soruce_id=b.source_id >> >> the "?" is filled in by a bind via values calculated at run time. >> Similar to the updates >> I use sql_prepare, sql_bind and sql_step to run the query. I then >> loop >> through the resulting >> rows I retrieve from the database with sqlite3_column_* . Doing this >> say 7k times for each >> run of the program (and then repeatedly in the pipeline) is hugely >> costly. How can I optimize >> my query/database for better performance? >> >> Sample data (table A) >> source_id x_pos >> y_pos mag band >> fr_time_stamp pix_bin_num >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> ===================================================================== >> 50275a003-000002-3 382.836 >> 235.303 6.162 3 >> 1260978065 23 >> 50275a003-000003-3 166.883 >> 567.99 6.032 3 >> 1260978065 51 >> 50275a003-000004-3 1009.492 >> 753.4 6.243 3 >> 1260978065 80 >> 50275a003-000005-3 10.083 >> 153.815 7.672 3 >> 1260978065 10 >> 50275a003-000006-3 332.153 >> 411.88 7.65 3 >> 1260978065 43 >> 50275a003-000007-3 888.086 >> 135.478 7.589 3 >> 1260978065 18 >> 50275a003-000009-3 208.277 >> 292.152 8.127 3 >> 1260978065 22 >> 50275a003-000013-3 788.648 >> 829.213 8.424 3 >> 1260978065 87 >> 50275a003-000014-3 277.768 >> 19.981 8.335 3 >> 1260978065 2 >> 50275a003-000017-3 665.116 >> 624.767 8.807 3 >> 1260978065 66 >> 50275a003-000018-3 170.859 >> 855.147 8.734 3 >> 1260978065 81 >> 50275a003-000019-3 694.634 >> 210.285 8.787 3 >> 1260978065 26 >> 50275a003-000020-3 293.737 >> 11.928 9.144 3 >> 1260978065 2 >> 50275a003-000023-3 311.53 >> 729.644 9.237 3 >> 1260978065 73 >> 50275a003-000024-3 284.052 >> 947.095 9.632 3 >> 1260978065 92 >> >> >> >> Sample data (table B) >> pix_bin_num source_id fr_time_stamp >> ============================================================ >> 21 50275a003-000002-3 1260978065 >> 11 50275a003-000002-3 1260978065 >> 31 50275a003-000002-3 1260978065 >> 12 50275a003-000002-3 1260978065 >> 22 50275a003-000002-3 1260978065 >> 32 50275a003-000002-3 1260978065 >> 2 50275a003-000002-3 1260978065 >> 42 50275a003-000002-3 1260978065 >> 3 50275a003-000002-3 1260978065 >> 42 50275a003-007106-3 1260978065 >> 0 50275a003-000002-4 1260978065 >> 5 50275a003-000002-4 1260978065 >> 10 50275a003-000002-4 1260978065 >> 1 50275a003-000002-4 1260978065 >> 6 50275a003-000002-4 1260978065 >> 11 50275a003-000002-4 1260978065 >> 16 50275a003-000002-4 1260978065 >> 2 50275a003-000002-4 1260978065 >> 7 50275a003-000002-4 1260978065 >> 12 50275a003-000002-4 1260978065 >> 17 50275a003-000002-4 1260978065 >> >> >> Result from "explain query plan" >> sqlite> explain query plan select * from latent_parents a, pix_bins b >> where b.pix_bin_num=0 and a.source_id=b.source_id; >> 0|1|TABLE pix_bins AS b WITH INDEX pix_bin_num_index_pix_bin_tbl >> 1|0|TABLE latent_parents AS a WITH INDEX source_id_index_lp_tbl >> >> Many thanks, >> rosemary. >> >> On Apr 7, 2009, at 1:57 PM, P Kishor wrote: >> >>> On Tue, Apr 7, 2009 at 3:45 PM, Rosemary Alles >>> <al...@ipac.caltech.edu> wrote: >>>> Hullo Puneet, >>>> >>>> Many thanks for your response. >>>> >>>> My understanding of a sqlite3 "transaction" is probably poor. From >>>> your >>>> response >>>> (if you are correct) I see that only UPDATES and INSERTS can be >>>> speeded up >>>> via bundling many numbers of them in a Being/Commit block? >>> >>> Not that it is any standard, but search for the word "transaction" >>> at >>> >>> http://developer.postgresql.org/pgdocs/postgres/sql-select.html >>> >>> You will see, Pg recommends using SELECTs inside a TRANSACTION for >>> just the reason I mentioned in my email... ensuring that you >>> retrieve >>> something dependable that is not changed on you midstream, not for >>> speed. >>> >>>> Leading me to >>>> ask: >>>> Is there no difference in behavior between a SINGLE select and >>>> several >>>> of them within the context of transaction? >>> >>> What do you mean by "behavior"? Do you mean what you will get back? >>> No, it shouldn't be different. Do you mean how fast you will get it >>> back? Dunno, but you can tell for sure by writing a trivial >>> benchmarking script on your data. >>> >>> >>>> >>>> And yes, each of the many SELECTS have a different WHERE clause. >>> >>> Don't mean to preempt your application, but bunching SELECTs with >>> different WHERE clause makes little sense. I mean, if you are doing >>> >>> SELECT .. FROM .. WHERE color = 'blue' >>> SELECT .. FROM .. WHERE color = 'red' >>> SELECT .. FROM .. WHERE color = 'green' >>> >>> you can just as well do >>> >>> SELECT .. FROM .. WHERE color IN ('blue','red','green') >>> >>> On the other hand, if you are doing >>> >>> SELECT .. FROM .. WHERE color = 'blue' >>> SELECT .. FROM .. WHERE taste = 'bitter' >>> SELECT .. FROM .. WHERE type = 'pill' >>> >>> That doesn't make much sense, but can also be accomplished with a >>> single SELECT and a bunch of ORs >>> >>> Maybe you should explain your actual problem a bit more. What >>> exactly >>> are you trying to accomplish? What does your db look like? Provide >>> some sample data, and perhaps example of your multiple but different >>> SELECT queries that you want to wrap in a transaction. >>> >>> Once again, if only speed is your aim, benchmark it. >>> >>> >>>> >>>> -rosemary. >>>> >>>> On Apr 7, 2009, at 12:38 PM, P Kishor wrote: >>>> >>>>> On Tue, Apr 7, 2009 at 2:36 PM, Rosemary Alles <al...@ipac.caltech.edu >>>>>> >>>>> wrote: >>>>>> >>>>>> Hullo Igor, >>>>>> >>>>>> Many thanks for your response: I believe I didn't phrase my >>>>>> question >>>>>> correctly: >>>>>> >>>>>> 1) If I were to bundle several thousand SELECT statements in a >>>>>> single >>>>>> transaction - why would it not run faster? >>>>> >>>>> as far as I understand, transactions matter only in the context of >>>>> UPDATEs and INSERTs for the purpose of speed (keep in mind, I am >>>>> not >>>>> addressing data integrity -- the ACID part here, but only >>>>> speed). A >>>>> transaction speeds this multiple UPDATEs and INSERTs by decreasing >>>>> the >>>>> number of times your program interacts with slowest part of your >>>>> computer, the hard disk. >>>>> >>>>> Multiple SELECTs in a transaction might help with the integrity, >>>>> but >>>>> ensuring that you don't end up getting data changed in mid-stream, >>>>> but >>>>> won't speed up the query. >>>>> >>>>> Are all your thousands of SELECTs based on different WHERE >>>>> criterion? >>>>> If not, they would really be just one SELECT. >>>>> >>>>>> 2) This is precisely the problem though - each of those >>>>>> statements >>>>>> will yield rows of results to be parsed with >>>>>> sqlite3_column - in the context of the user's (my) program. If >>>>>> many >>>>>> SELECT statements are issued within the context >>>>>> of a single transaction (repeatedly), how does one deal with the >>>>>> results without a callback (if using sql_step)? Yes, >>>>>> sql_exec is touted to be a wrapper around sql_prepare, bind, >>>>>> step. >>>>>> However, is does (also - additionally) offer the >>>>>> option of a user supplied calleback routine which sql_prepare >>>>>> etc. do >>>>>> not. >>>>>> >>>>>> Essentially, my question is about context. if many many SELECTS >>>>>> are >>>>>> bundled in a single transaction using prepare, >>>>>> bind and step. In what context does one parse the results? Do we >>>>>> not >>>>>> have synchronizing issue here? >>>>>> >>>>>> Thanks again, >>>>>> rosemary >>>>>> >>>>>> On Apr 6, 2009, at 8:03 PM, Igor Tandetnik wrote: >>>>>> >>>>>>> "Rosemary Alles" <al...@ipac.caltech.edu> wrote >>>>>>> in message news:20a6b796-613b-4f5d-bfca-359d6b9fa...@ipac.caltech.edu >>>>>>>> >>>>>>>> I want to speed up my app. Can I run SELECT statements within >>>>>>>> the >>>>>>>> context of a transaction. >>>>>>> >>>>>>> Yes, but it's unlikely to make it run any faster. >>>>>>> >>>>>>>> If so, how does one handle the query >>>>>>>> results? >>>>>>> >>>>>>> The same way as when running it without an explicit transaction. >>>>>>> >>>>>>>> I would assume this cannot be done with sql_prepare, >>>>>>>> sql_bind, sql_step? >>>>>>> >>>>>>> Of course it can. See sqlite3_column_* >>>>>>> >>>>>>>> Would I *have* to use sql_exec >>>>>>> >>>>>>> No. And if you look at the implementation of sqlite3_exec, it >>>>>>> uses >>>>>>> sqlite3_prepare and sqlite3_step internally anyway. It's >>>>>>> maintained >>>>>>> mostly for backward compatibility. >>>>>>> >>>>>>>> What am I giving up >>>>>>>> by using sql_exec vs sql_prepare, sql_bind and sql_step? >>>>>>> >>>>>>> Off the top of my head: 1) strong typing (you get all data as >>>>>>> strings, >>>>>>> so that, say, an integer is converted to string and then you'll >>>>>>> have >>>>>>> to >>>>>>> convert it back); 2) streaming (with sqlite3_exec, the whole >>>>>>> resultset >>>>>>> must be present in memory at the same time; with sqlite3_step, >>>>>>> only >>>>>>> one >>>>>>> row's worth of data needs to be present in memory; makes a huge >>>>>>> difference for very large resultsets). >>>>>>> >>>>>>> Igor Tandetnik >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> > > > > -- > Puneet Kishor http://www.punkish.org/ > Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ > Carbon Model http://carbonmodel.org/ > Open Source Geospatial Foundation http://www.osgeo.org/ > Sent from Madison, WI, United States > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users