Re: paginating of query results.

Another issue is that the data can change while the user is
paginating through it.  Earlier records being deleted can
mean that later records are skipped in paginating forward;
records being added eariler mean that later records are seen
on two different pages.  So the naive setLimit()/setOffset()
approach has issues with this.

As you say, the problem with loading all the data and
storing it in temporary storage is the memory consumption,
which could be prohibitive.  Also, a large amount of time is
spent converting the SQL data to Torque objects, most of
which will never be used if the user does not go past the
first couple of pages (which is common enough user
behaviour).

What we've done (following a suggestion dug up by Rodney on,
I think, an Oracle list) is write all the results matching
the query to a "results cache" table in the database, keyed
by a search id that the client holds as a reference.  We
only write those fields that are actually going to be
displayed in the results.  This is done with a single
"INSERT INTO Cache_Table SELECT ... FROM Original_Table"
statement, so nothing comes across the wire into javaland at
this stage.  This result cache is then paginated through
using setLimit()/setOffset().

You've then got the issue of how to expire the cache.  We
handle this by having a cache manager that is an
HttpBindingEventListener attach itself to the user's
session, which deletes the cached search when the user logs
out.  We then clean up any cached results that are missed as
part of our regular maintenance scripts.

Performance-wise, using HEAP tables under MySQL, time taken
is similar to loading all records into Torque objects
(except, of course, the cost is moved to the MySQL process).
The advantage is reduced memory usage.  (The other
consideration for us was the requirement that the user be
able to re-sort the order of results at any stage, and we
preferred to have the database do this for us than have to
re-sort in Java.)

The other approach we considered was to retrieve the list of
IDs of records meeting the search criteria, then as we
paginate through results only fetch the IDs on the current
"page".  This reduces the initial start-up hit of either
loading all records into Java, or into a cache table in the
database, and it also deals with the issue of new records
being added during pagination (they won't turn up in the
result list).  But it didn't solve the issue of records
being deleted during pagination.  (For other reasons, we 
no longer physically delete records, only mark them as
"deleted", so this would no longer be an issue for us.)

In general, I'd suggest:

1.) if you're not too worried about results consistency, use
the naive setLimit()/setOffset() method.  This is the
easiest, the most efficient (I think), and the least likely
to bring down your server through out-of-memory problems.

2.) if results consistency matters, but you're not too
worried about running out of memory, load all the results
into Java and store them in the user's temporary storage.

3.) if results consistency matters AND you're worried about
running out of memory, but processing time is not a huge
concern, write the results to a cache table.

4.) if you're worried about results consistency, memory
usage, and processing time, and you're willing to put a lot
of effort in (there are quite a few gotchas), then breath
deep, and think about retrieving a list of IDs and working
with them.

Hope this is of some use.  I would be interested to hear of
other approaches, too.

William

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to