Re: [sqlite] indexing rows from a query

2008-05-16 Thread jsg72

On May 16, 2008, at 5:04 PM, Jay A. Kreibich wrote:

> On Fri, May 16, 2008 at 04:44:10PM -0700, [EMAIL PROTECTED] scratched on  
> the wall:
>> Sorry if this is a silly question - I don't have much experience with
>> databases.
>>
>> Say I have a table with many (millions+) of rows and I have a query:
>>
>> SELECT * FROM mytable WHERE some_condition ORDER BY rowid
>>
>> First, I'm assuming that in addition to whatever time some_condition
>> takes, I'll see an overhead of O( N log N ) for the sort in the worst
>> case, but probably much less (O(N) or O(1)?) because it's probably be
>> sorted anyway by rowid.  Is that correct?
>
>  If "some_condition" doesn't trigger the use of another index, then
>  yes... the table will be scanned sequentially by rowid, so after the
>  condition is assessed there shouldn't be any sorting overhead.
>
>  If some_condition causes the use of a different index, then the
>  sorting will be post-processed and the sort costs will be higher--
>  but your conditional costs might be much lower.
>
>  Which is better depends a lot on the percentage of total rows
>  returned.  If you have a large table, but only return a dozen or so
>  rows, you want to make the condition efficient.  If you're returning
>  20% of the table, you want to sort to be cheap.

Gotcha.  Thanks.  In my application, I expect the common case to be  
returning a large fraction of the table, so I'm most concerned with  
the sort overhead.


>
>
>> My real question is if there is an efficient way to index the results
>> of such a query.  In other words, I'm looking for rows N through N 
>> +100
>> of the result.  Can I do much better than just executing the query  
>> and
>> throwing away the first N rows?  I thought of making an auxiliary
>> table to map rowid in the table with row number of the query for  
>> large
>> chunks of the table, but that can get to be a big memory footprint if
>> some_condition changes often.
>
>  If you scan things in order-- e.g. 1-100, 101-200, etc. just remember
>  the row id of the last item in the last results set and add a  
> "rowid >
>  [last]" condition to the next query.  Use a limit clause to define
>  the size of your window.  The big thing is that you'd like to avoid
>  using an OFFSET.
>
>  Have a look here for more ideas:
>  http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

This makes sense.  I am, in fact, using this to send data to a  
scrollable window, and it seems like this would work great to scroll  
down to the next page.  The real problem, though, is if the user grabs  
the scrollbar and drags it to somewhere far away (or tries to scroll  
up).  It sounds like I'm just stuck using OFFSET in that case, right?
Thanks,
Jeff



>
>
>   -j
>
> -- 
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "'People who live in bamboo houses should not throw pandas.' Jesus  
> said that."
>   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech  
> 2006"
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] indexing rows from a query

2008-05-16 Thread Jay A. Kreibich
On Fri, May 16, 2008 at 04:44:10PM -0700, [EMAIL PROTECTED] scratched on the 
wall:
> Sorry if this is a silly question - I don't have much experience with  
> databases.
> 
> Say I have a table with many (millions+) of rows and I have a query:
> 
> SELECT * FROM mytable WHERE some_condition ORDER BY rowid
> 
> First, I'm assuming that in addition to whatever time some_condition  
> takes, I'll see an overhead of O( N log N ) for the sort in the worst  
> case, but probably much less (O(N) or O(1)?) because it's probably be  
> sorted anyway by rowid.  Is that correct?

  If "some_condition" doesn't trigger the use of another index, then
  yes... the table will be scanned sequentially by rowid, so after the
  condition is assessed there shouldn't be any sorting overhead.

  If some_condition causes the use of a different index, then the
  sorting will be post-processed and the sort costs will be higher--
  but your conditional costs might be much lower.

  Which is better depends a lot on the percentage of total rows
  returned.  If you have a large table, but only return a dozen or so
  rows, you want to make the condition efficient.  If you're returning
  20% of the table, you want to sort to be cheap.

> My real question is if there is an efficient way to index the results  
> of such a query.  In other words, I'm looking for rows N through N+100  
> of the result.  Can I do much better than just executing the query and  
> throwing away the first N rows?  I thought of making an auxiliary  
> table to map rowid in the table with row number of the query for large  
> chunks of the table, but that can get to be a big memory footprint if  
> some_condition changes often.

  If you scan things in order-- e.g. 1-100, 101-200, etc. just remember
  the row id of the last item in the last results set and add a "rowid >
  [last]" condition to the next query.  Use a limit clause to define
  the size of your window.  The big thing is that you'd like to avoid
  using an OFFSET.

  Have a look here for more ideas:
  http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] indexing rows from a query

2008-05-16 Thread Igor Tandetnik
<[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> My real question is if there is an efficient way to index the results
> of such a query.  In other words, I'm looking for rows N through N+100
> of the result.  Can I do much better than just executing the query and
> throwing away the first N rows?

In general, no. You can use LIMIT and OFFSET clauses, but all that does 
is instruct SQLite to throw away the first N rows so you don't have to 
do it manually.

More often than not, what you really want is not "get 100 rows at 
position N", but "get 100 rows following the rows I've retrieved last 
time". This can in fact be implemented more efficiently: see

http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] indexing rows from a query

2008-05-16 Thread Scott Baker
[EMAIL PROTECTED] wrote:
> Sorry if this is a silly question - I don't have much experience with  
> databases.
> 
> Say I have a table with many (millions+) of rows and I have a query:
> 
> SELECT * FROM mytable WHERE some_condition ORDER BY rowid
> 
> First, I'm assuming that in addition to whatever time some_condition  
> takes, I'll see an overhead of O( N log N ) for the sort in the worst  
> case, but probably much less (O(N) or O(1)?) because it's probably be  
> sorted anyway by rowid.  Is that correct?
> 
> My real question is if there is an efficient way to index the results  
> of such a query.  In other words, I'm looking for rows N through N+100  
> of the result.  Can I do much better than just executing the query and  
> throwing away the first N rows?  I thought of making an auxiliary  
> table to map rowid in the table with row number of the query for large  
> chunks of the table, but that can get to be a big memory footprint if  
> some_condition changes often.

Can't you just do:

SELECT * FROM mytable WHERE some_condition ORDER BY rowid LIMIT 100 OFFSET 0;

To get the first 100 rows?


-- 
Scott Baker - Canby Telcom
RHCE - System Administrator - 503.266.8253
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] indexing rows from a query

2008-05-16 Thread jsg72
Sorry if this is a silly question - I don't have much experience with  
databases.

Say I have a table with many (millions+) of rows and I have a query:

SELECT * FROM mytable WHERE some_condition ORDER BY rowid

First, I'm assuming that in addition to whatever time some_condition  
takes, I'll see an overhead of O( N log N ) for the sort in the worst  
case, but probably much less (O(N) or O(1)?) because it's probably be  
sorted anyway by rowid.  Is that correct?

My real question is if there is an efficient way to index the results  
of such a query.  In other words, I'm looking for rows N through N+100  
of the result.  Can I do much better than just executing the query and  
throwing away the first N rows?  I thought of making an auxiliary  
table to map rowid in the table with row number of the query for large  
chunks of the table, but that can get to be a big memory footprint if  
some_condition changes often.

Does anyone have any suggestions?
Thanks,
Jeff
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users