Re: Pagination strategies - a reasonably complete review (long)

Scott Eade Mon, 15 Jul 2002 20:13:55 -0700

Yes, some kind of combination may be the best way to go - read on...

A little more research has revealed:


1. That turbine 2.1 torque provides support for database server side:

    limit:
        MySQL
        PostgreSQL
        Sybase
    offset:
        MySQL
        PostgreSQL

However this is expanded in torque 3.0 which supports:

    limit:
        MySQL
        MSSQL
        Oracle
        PostgreSQL
        Sybase
    offset:
        MySQL
        PostgreSQL

Strangely the relevant code in BasePeer doesn't seem to use the limits for
MSSQL, Sybase - perhaps this is not yet fully implemented.

Offset can be supported in Oracle using the same mechanism as for limit -
i.e. The pseudo column "rownum" (use "where rownum < offset + limit" or
"where rownum between 1 and 11").  It looks to me like it would be
relatively straightforward to implement the offset functionality for oracle
(if anyone is interested in doing this the relevant classes are DBOracle and
BasePeer).

2. BasePeer provides services for limiting the number of rows that are
returned.  If the server does not implement the offset and limit, the values
are passed through to Village which I assume discards the unwanted data.
This is excellent as it means that setLimit() and setOffset() should work on
all database platforms - the trick is that it will be more efficient when
the support is provided by the database server.  Note that a non-index sort
requirement will still mess things up with this approach.

3. John McNally's class that I mentioned below is:

    org.apache.turbine.util.db.LargeSelect

Which has been moved to the attic in jakarta-turbine-2 recently in
preparation for turbine 2.2.  The good news is that this class is being
carried forward in jakarta-turbine-torque as:

    org.apache.torque.util.LargeSelect

According to the comments this is an as yet untested class.  It does however
appear to be an excellent implementation of the combination of the other two
solutions including a background thread that retrieves the data up to a
specified limit of rows.  In my case, with 7500 rows I could retrieve say
the middle 500 rows and page through them 50 at a time.  If I page off the
end of my 500 rows a new query will be executed to retrieve the next 500
rows, if I am go backwards past the beginning of the 500 rows the class
jumps back 2 times my page size (100) in anticipation that I may go back
another page and re-executes the query.  The query itself uses BasePeer, so
the comments mentioned above under items 1 and 2 apply.  If you are using a
non-index based sort criteria then all rows will be retrieved by the DB in
order to perform the sort each time the query is executed, however this is
likely to be as good a trade off as any.

While the comments indicate that some debugging may be required, this looks
like an excellent starting point - anyone interested in paging result sets
should be working on this.


With regards to your comment concerning the ability to reuse the data
available to the list query for the detail page, I would imagine that
LargeSelect could be enhanced to support this.  It would be worth figuring
out a way to do this that would also signal to the LargeSelect that it has
completed its processing and can thus release its resources - otherwise you
have a largish chunk of memory being consumed by the LargeSelect that the
user is no longer interested in.  Conversely, it is good SQL practice to
select only the columns you are interested in for your list page which might
impact your ability to reuse this data on a detail page.


Not sure when I am going to get to dive in and use this - hopefully within
the next month or so.  Mean time I hope the list benefits from my
investigations.

Cheers,

Scott
-- 
Scott Eade
Backstage Technologies Pty. Ltd.
Web: http://www.backstagetech.com.au


> From: "Colin Chalmers" <[EMAIL PROTECTED]>
> Reply-To: "Turbine Users List" <[EMAIL PROTECTED]>
> Date: Mon, 15 Jul 2002 17:11:33 +0200
> To: "Turbine Users List" <[EMAIL PROTECTED]>
> Subject: Re: Pagination strategies
> 
> This is also a problem I'm facing and I too am looking for an efficient
> maaner to do pagination without hitting the DB for every page, subsequently
> it would be handy if after clicking on one of the entries on the index page
> one sees all the info pertaining to that entry. Again without having to
> rehit the DB.
> 
> My concern is the validity of data and how long it should be kept in memory,
> whether that being a GlobalCache or user.temp. I haven't looked at the
> Scarab code, but when does the user.Temp info get cleared? If at all.
> 
> Would it not be wise to use a combination of both solutions? That you limit
> the result to say 500 rows (using Criteria setOffset() and setLimit()
> methods) which you could put into user.temp for pagination?
> 
> /Colin
> 
>> I have looked at how Scarab does this.  Scarab basically retrieves all
> rows,
>> stores a copy in user.temp and returns them to the user a page at a time
>> (where the number of rows per page is selected by the user).  While this
> is
>> going to be efficient from the perspective of the database (particularly
>> when sorting is involved) in that the main query is only executed once, it
>> has the potential to consume an awfully large amount of memory (I'm
> playing
>> around with a result set that is going to retrieve 7,500 records, enough
> to
>> cause the JVM running with standard memory settings to throw an
>> OutOfMemoryError).  Even if I expanded the amount of memory available to
>> cater for this, what happens when the user wanders off somewhere else in
> the
>> application leaving that large result set in user.temp - what does Scarab
> do
>> in this case?
>> 
>> An alternative approach will be to use the Criteria setOffset() and
>> setLimit() methods to instruct the database to only return a page worth of
>> rows.  The problem with this approach is that I will need to limit the
>> ordering of records to one where a database index exists, otherwise the
>> database will need to retrieve all rows in order to perform its sort
> before
>> it then throws away all but a page worth of rows.  On top of this I am not
>> sure if the setOffset() and setLimit() methods are fully supported in
> torque
>> (esp. In turbine 2.1) or for that matter the underlying database systems
>> (equivalent SQL appears to exist for MySQL and PostgreSQL, but I haven't
>> checked Oracle or any of the other alternatives).
>> 
>> Davis Kulis posted a good tip for the above solution which involves using
>> additional criteria to filter out records that have already been
> displayed.
>> This would be effective if the result set is sorted by an index value, but
>> perhaps not so useful if all rows need to be retrieved in order for a sort
>> to take place.
>> 
>> I am pretty sure it was John McNally that posted details of a prototype
>> class for dealing with pagination (this would have been months and months
>> ago).  I was sure I kept a reference to it at the time, but I can no
> longer
>> find it.  I seem to recall that this worked in the same way as described
> for
>> Scarab above, but If I can be reminded of the class I will gladly review
> it
>> again.
>> 
>> Anyway, this is turning into a very long message.  I guess I am interested
>> in some feedback as to how others have handled pagination of very large
>> result sets.  As I see it the only answer in this situation may be to go
>> with setOffset()/setLimit() with limited sort options.  Got a better
>> solution?
>> 
>> Cheers,
>> 
>> Scott
>> --
>> Scott Eade
>> Backstage Technologies Pty. Ltd.
>> Web: http://www.backstagetech.com.au
>> 
>> 
>> 
>> --
>> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
>> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>> 
>> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Pagination strategies - a reasonably complete review (long)

Reply via email to