Re: [sqlite] indexing rows from a query
On May 16, 2008, at 5:04 PM, Jay A. Kreibich wrote: > On Fri, May 16, 2008 at 04:44:10PM -0700, [EMAIL PROTECTED] scratched on > the wall: >> Sorry if this is a silly question - I don't have much experience with >> databases. >> >> Say I have a table with many (millions+) of rows and I have a query: >> >> SELECT * FROM mytable WHERE some_condition ORDER BY rowid >> >> First, I'm assuming that in addition to whatever time some_condition >> takes, I'll see an overhead of O( N log N ) for the sort in the worst >> case, but probably much less (O(N) or O(1)?) because it's probably be >> sorted anyway by rowid. Is that correct? > > If "some_condition" doesn't trigger the use of another index, then > yes... the table will be scanned sequentially by rowid, so after the > condition is assessed there shouldn't be any sorting overhead. > > If some_condition causes the use of a different index, then the > sorting will be post-processed and the sort costs will be higher-- > but your conditional costs might be much lower. > > Which is better depends a lot on the percentage of total rows > returned. If you have a large table, but only return a dozen or so > rows, you want to make the condition efficient. If you're returning > 20% of the table, you want to sort to be cheap. Gotcha. Thanks. In my application, I expect the common case to be returning a large fraction of the table, so I'm most concerned with the sort overhead. > > >> My real question is if there is an efficient way to index the results >> of such a query. In other words, I'm looking for rows N through N >> +100 >> of the result. Can I do much better than just executing the query >> and >> throwing away the first N rows? I thought of making an auxiliary >> table to map rowid in the table with row number of the query for >> large >> chunks of the table, but that can get to be a big memory footprint if >> some_condition changes often. > > If you scan things in order-- e.g. 1-100, 101-200, etc. just remember > the row id of the last item in the last results set and add a > "rowid > > [last]" condition to the next query. Use a limit clause to define > the size of your window. The big thing is that you'd like to avoid > using an OFFSET. > > Have a look here for more ideas: > http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor This makes sense. I am, in fact, using this to send data to a scrollable window, and it seems like this would work great to scroll down to the next page. The real problem, though, is if the user grabs the scrollbar and drags it to somewhere far away (or tries to scroll up). It sounds like I'm just stuck using OFFSET in that case, right? Thanks, Jeff > > > -j > > -- > Jay A. Kreibich < J A Y @ K R E I B I.C H > > > "'People who live in bamboo houses should not throw pandas.' Jesus > said that." > - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech > 2006" > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] indexing rows from a query
On Fri, May 16, 2008 at 04:44:10PM -0700, [EMAIL PROTECTED] scratched on the wall: > Sorry if this is a silly question - I don't have much experience with > databases. > > Say I have a table with many (millions+) of rows and I have a query: > > SELECT * FROM mytable WHERE some_condition ORDER BY rowid > > First, I'm assuming that in addition to whatever time some_condition > takes, I'll see an overhead of O( N log N ) for the sort in the worst > case, but probably much less (O(N) or O(1)?) because it's probably be > sorted anyway by rowid. Is that correct? If "some_condition" doesn't trigger the use of another index, then yes... the table will be scanned sequentially by rowid, so after the condition is assessed there shouldn't be any sorting overhead. If some_condition causes the use of a different index, then the sorting will be post-processed and the sort costs will be higher-- but your conditional costs might be much lower. Which is better depends a lot on the percentage of total rows returned. If you have a large table, but only return a dozen or so rows, you want to make the condition efficient. If you're returning 20% of the table, you want to sort to be cheap. > My real question is if there is an efficient way to index the results > of such a query. In other words, I'm looking for rows N through N+100 > of the result. Can I do much better than just executing the query and > throwing away the first N rows? I thought of making an auxiliary > table to map rowid in the table with row number of the query for large > chunks of the table, but that can get to be a big memory footprint if > some_condition changes often. If you scan things in order-- e.g. 1-100, 101-200, etc. just remember the row id of the last item in the last results set and add a "rowid > [last]" condition to the next query. Use a limit clause to define the size of your window. The big thing is that you'd like to avoid using an OFFSET. Have a look here for more ideas: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] indexing rows from a query
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > My real question is if there is an efficient way to index the results > of such a query. In other words, I'm looking for rows N through N+100 > of the result. Can I do much better than just executing the query and > throwing away the first N rows? In general, no. You can use LIMIT and OFFSET clauses, but all that does is instruct SQLite to throw away the first N rows so you don't have to do it manually. More often than not, what you really want is not "get 100 rows at position N", but "get 100 rows following the rows I've retrieved last time". This can in fact be implemented more efficiently: see http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] indexing rows from a query
[EMAIL PROTECTED] wrote: > Sorry if this is a silly question - I don't have much experience with > databases. > > Say I have a table with many (millions+) of rows and I have a query: > > SELECT * FROM mytable WHERE some_condition ORDER BY rowid > > First, I'm assuming that in addition to whatever time some_condition > takes, I'll see an overhead of O( N log N ) for the sort in the worst > case, but probably much less (O(N) or O(1)?) because it's probably be > sorted anyway by rowid. Is that correct? > > My real question is if there is an efficient way to index the results > of such a query. In other words, I'm looking for rows N through N+100 > of the result. Can I do much better than just executing the query and > throwing away the first N rows? I thought of making an auxiliary > table to map rowid in the table with row number of the query for large > chunks of the table, but that can get to be a big memory footprint if > some_condition changes often. Can't you just do: SELECT * FROM mytable WHERE some_condition ORDER BY rowid LIMIT 100 OFFSET 0; To get the first 100 rows? -- Scott Baker - Canby Telcom RHCE - System Administrator - 503.266.8253 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] indexing rows from a query
Sorry if this is a silly question - I don't have much experience with databases. Say I have a table with many (millions+) of rows and I have a query: SELECT * FROM mytable WHERE some_condition ORDER BY rowid First, I'm assuming that in addition to whatever time some_condition takes, I'll see an overhead of O( N log N ) for the sort in the worst case, but probably much less (O(N) or O(1)?) because it's probably be sorted anyway by rowid. Is that correct? My real question is if there is an efficient way to index the results of such a query. In other words, I'm looking for rows N through N+100 of the result. Can I do much better than just executing the query and throwing away the first N rows? I thought of making an auxiliary table to map rowid in the table with row number of the query for large chunks of the table, but that can get to be a big memory footprint if some_condition changes often. Does anyone have any suggestions? Thanks, Jeff ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users