William, You make some excellent points. In my case I am not so concerned about the consistency of the result set however this should certainly be a conscious decision when designing your paging strategy.
You may see some advantage of combining your approach with the database server side implementation of setLimit()/setOffset() for Oracle that I mention in my "reasonably complete review [that skips some important issues raised by William ;-)]" message that I posted a few minutes ago. Cheers, Scott -- Scott Eade Backstage Technologies Pty. Ltd. Web: http://www.backstagetech.com.au > From: William Webber <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > Date: Tue, 16 Jul 2002 12:35:38 +1000 > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] > Subject: Re: Fwd: Pagination strategies > > Re: paginating of query results. > > Another issue is that the data can change while the user is > paginating through it. Earlier records being deleted can > mean that later records are skipped in paginating forward; > records being added eariler mean that later records are seen > on two different pages. So the naive setLimit()/setOffset() > approach has issues with this. > > As you say, the problem with loading all the data and > storing it in temporary storage is the memory consumption, > which could be prohibitive. Also, a large amount of time is > spent converting the SQL data to Torque objects, most of > which will never be used if the user does not go past the > first couple of pages (which is common enough user > behaviour). > > What we've done (following a suggestion dug up by Rodney on, > I think, an Oracle list) is write all the results matching > the query to a "results cache" table in the database, keyed > by a search id that the client holds as a reference. We > only write those fields that are actually going to be > displayed in the results. This is done with a single > "INSERT INTO Cache_Table SELECT ... FROM Original_Table" > statement, so nothing comes across the wire into javaland at > this stage. This result cache is then paginated through > using setLimit()/setOffset(). > > You've then got the issue of how to expire the cache. We > handle this by having a cache manager that is an > HttpBindingEventListener attach itself to the user's > session, which deletes the cached search when the user logs > out. We then clean up any cached results that are missed as > part of our regular maintenance scripts. > > Performance-wise, using HEAP tables under MySQL, time taken > is similar to loading all records into Torque objects > (except, of course, the cost is moved to the MySQL process). > The advantage is reduced memory usage. (The other > consideration for us was the requirement that the user be > able to re-sort the order of results at any stage, and we > preferred to have the database do this for us than have to > re-sort in Java.) > > The other approach we considered was to retrieve the list of > IDs of records meeting the search criteria, then as we > paginate through results only fetch the IDs on the current > "page". This reduces the initial start-up hit of either > loading all records into Java, or into a cache table in the > database, and it also deals with the issue of new records > being added during pagination (they won't turn up in the > result list). But it didn't solve the issue of records > being deleted during pagination. (For other reasons, we > no longer physically delete records, only mark them as > "deleted", so this would no longer be an issue for us.) > > In general, I'd suggest: > > 1.) if you're not too worried about results consistency, use > the naive setLimit()/setOffset() method. This is the > easiest, the most efficient (I think), and the least likely > to bring down your server through out-of-memory problems. > > 2.) if results consistency matters, but you're not too > worried about running out of memory, load all the results > into Java and store them in the user's temporary storage. > > 3.) if results consistency matters AND you're worried about > running out of memory, but processing time is not a huge > concern, write the results to a cache table. > > 4.) if you're worried about results consistency, memory > usage, and processing time, and you're willing to put a lot > of effort in (there are quite a few gotchas), then breath > deep, and think about retrieving a list of IDs and working > with them. > > Hope this is of some use. I would be interested to hear of > other approaches, too. > > William -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
