Re: Pagination

2009-06-10 Thread Rob Heittman
Status code 303 (Status.REDIRECT_SEE_OTHER in Restlet) exists for the
POST/Redirect/GET case.  I'm not sure there's any "right" way vis a vis
returning an entity directly from a POST -- certainly that's useful too in
straightforward cases.  P/R/G is really helpful when the result is more
complex and may really consist of multiple GETable resources as in
pagination.
On Wed, Jun 10, 2009 at 11:54 AM, Dustin N. Jenkins <
dustin.jenk...@nrc-cnrc.gc.ca> wrote:

> I assume the POST/Redirect/GET pattern is the Client POSTing to the
> Resource, and instead of filling the Response's Entity with the
> Representation of the change, one simply redirects the Client to the GET
> representation.  Is this the desired behaviour?  I was under the
> impression that populating the Response's Entity in the POST was
> improper practice, but we often lazily do it.
>

--
http://restlet.tigris.org/ds/viewMessage.do?dsForumId=4447&dsMessageId=2360979

Re: Pagination

2009-06-10 Thread Dustin N. Jenkins
I assume the POST/Redirect/GET pattern is the Client POSTing to the 
Resource, and instead of filling the Response's Entity with the 
Representation of the change, one simply redirects the Client to the GET 
representation.  Is this the desired behaviour?  I was under the 
impression that populating the Response's Entity in the POST was 
improper practice, but we often lazily do it.

One more thing I thought of that might be handy is the ability to go 
directly to the First or Last page.  Going to the First page would be 
easy, I think, as one would simply omit the AFTER clause, but going to 
the Last page seems like a different story.  The type of Pagination 
we've been talking about is simply a "Go to Next Page" and "Go to 
Previous Page" design, which should be sufficient I guess.  Although I 
think users typically would like to see how many pages there are in the 
result set.  Having said that, though, I suppose if the Searching 
service were asked to simply return everything, and only Resource 
consumes what it wants to based on, say, the User's preferred page size, 
then one could simply do the math on the Collection size.  Anyway, the 
scope of Pagination seems to be growing...

Thanks again for your help Rob.  I really like this approach to RESTful 
Pagination.

Dustin


Rob Heittman wrote:
> Yes, keeping the state bookmarkable for a complex search is also an 
> interesting challenge.  I know one web site (also a science app) that 
> has several hundred variables that can be incorporated in a query, 
> more than a bookmark would easily store.  They keep a permanent cache 
> of search queries in the database and return a "minified" URL, like 
> bit.ly 
> <http://bit.ly>, using the Post/Redirect/Get pattern.  They mark position in 
> pagination using query params:
>
> http://{science-app}/search/afq1z?start=1564&extent=20
>
> but I also like the "after" approach better:
>
> http://{science-app}/search/afq1z?after=Sula+Nebouxi,Ecuador&extent=20
>
> There are also nice properties of the minified query identifier URL, 
> in that it lends itself to subsequent RESTful interrogation in other ways:
>
> http://{science-app}/search/afq1z/sql -- retrieve SQL query definition
> http://{science-app}/search/afq1z/export/csv -- dump entire data set 
> to CSV
>
> or fun using Variants ... etc ...
>
> On Tue, Jun 9, 2009 at 11:44 AM, Dustin N. Jenkins 
> mailto:dustin.jenk...@nrc-cnrc.gc.ca>> 
> wrote:
>
> Hi Rob,
>
> Thank you very much for the detailed post.  It's very useful.
>
> My Persistence Layer uses Hibernate, which in turn uses ehcache as the
> Second Layer cache, but I've always had it turned off, so now
> would be a
> good time to experiment with it I suppose.
>
> A stable search result is not required in my case, and I would happily
> go back to the Persistence Layer each time as I deal with Scientific
> results that are updated all the time.  A user wouldn't
> necessarily get
> lost while moving from page to page.
>
> In reference to Josh's solution, I really like the idea of going
> by the
> sorted results and asking for the data after the last known item.  I
> deal with a multi-field form; upwards of a dozen fields to search
> on, so
> passing data back and forth may not be viable all the time, especially
> with a GET given the known character limitation.  However, do users
> commonly bookmark a search result with a page number?  I could
> definitely see it.  Perhaps the bookmark would encapsulate the AFTER
> clause in the URL.
>
> Thanks again, Rob.  It is an interesting problem.
> Dustin
>
>
> Rob Heittman wrote:
> > Ah, pagination.  One of the great programming tradeoffs  :-)  Have a
> > look at this comment thread from Ohloh a while back.
> >
> > http://www.ohloh.net/forums/3491/topics/1056
> >
> > Josh Triplett proposes a good solution that is lightweight for
> paging
> > non-critical data without server state.
> >
> > You can guarantee a stable search result for the duration of the
> > browse by caching the entire result set server side and providing a
> > means of moving through it ... that might scale to hundreds of
> > results, but not so much to millions.  Still, that's the usual
> Session
> > idiom.
> >
> > Here's what I usually do ... send an HTML or XML representation with
> > sufficient information about how to repeat the search and page thru
> > it, but keep no server side state per se.  I just make sure the data
> > l

Re: Pagination

2009-06-09 Thread Rob Heittman
Yes, keeping the state bookmarkable for a complex search is also an
interesting challenge.  I know one web site (also a science app) that has
several hundred variables that can be incorporated in a query, more than a
bookmark would easily store.  They keep a permanent cache of search queries
in the database and return a "minified" URL, like bit.ly, using the
Post/Redirect/Get pattern.  They mark position in
pagination using query params:
http://{science-app}/search/afq1z?start=1564&extent=20

but I also like the "after" approach better:

http://{science-app}/search/afq1z?after=Sula+Nebouxi,Ecuador&extent=20

There are also nice properties of the minified query identifier URL, in that
it lends itself to subsequent RESTful interrogation in other ways:

http://{science-app}/search/afq1z/sql -- retrieve SQL query definition
http://{science-app}/search/afq1z/export/csv -- dump entire data set to CSV

or fun using Variants ... etc ...

On Tue, Jun 9, 2009 at 11:44 AM, Dustin N. Jenkins <
dustin.jenk...@nrc-cnrc.gc.ca> wrote:

> Hi Rob,
>
> Thank you very much for the detailed post.  It's very useful.
>
> My Persistence Layer uses Hibernate, which in turn uses ehcache as the
> Second Layer cache, but I've always had it turned off, so now would be a
> good time to experiment with it I suppose.
>
> A stable search result is not required in my case, and I would happily
> go back to the Persistence Layer each time as I deal with Scientific
> results that are updated all the time.  A user wouldn't necessarily get
> lost while moving from page to page.
>
> In reference to Josh's solution, I really like the idea of going by the
> sorted results and asking for the data after the last known item.  I
> deal with a multi-field form; upwards of a dozen fields to search on, so
> passing data back and forth may not be viable all the time, especially
> with a GET given the known character limitation.  However, do users
> commonly bookmark a search result with a page number?  I could
> definitely see it.  Perhaps the bookmark would encapsulate the AFTER
> clause in the URL.
>
> Thanks again, Rob.  It is an interesting problem.
> Dustin
>
>
> Rob Heittman wrote:
> > Ah, pagination.  One of the great programming tradeoffs  :-)  Have a
> > look at this comment thread from Ohloh a while back.
> >
> > http://www.ohloh.net/forums/3491/topics/1056
> >
> > Josh Triplett proposes a good solution that is lightweight for paging
> > non-critical data without server state.
> >
> > You can guarantee a stable search result for the duration of the
> > browse by caching the entire result set server side and providing a
> > means of moving through it ... that might scale to hundreds of
> > results, but not so much to millions.  Still, that's the usual Session
> > idiom.
> >
> > Here's what I usually do ... send an HTML or XML representation with
> > sufficient information about how to repeat the search and page thru
> > it, but keep no server side state per se.  I just make sure the data
> > layer is smart enough about caching result sets to avoid unnecessary
> work.
> >
> > Example: say I am exposing a fulltext search over a collection of
> > 10,000 documents and someone searches on "the".
> >
> > Client hits Resource (stateless, short lived) by POST to /search with
> > something like
> >
> > the
> >
> > Resource submits the search to a query service.
> > Query service hits fulltext index and gets 9,995 hits.  Caches this
> > result, "the"="{set of 9,995 hits}"
> > Resource consumes first 10 results from query service.  Sends
> > something like:
> >
> >  9995
> >  1
> >  10
> >  The first of many
> >  ...
> >  The tenth of many
> >
> > Let's say client wants the next page of results.  It immediately sends
> > back:
> >
> >  the
> >  11
> >
> > Resource asks query service for "the" again.
> > Query service fetches "the"="{set of 9,995 hits}" out of cache.
> > Resource consumes results 11-20 from query service ... etc.
> >
> > This approach is not guaranteed stable.  If the result set expires
> > from cache and also changes in between paginated queries you might end
> > up missing some results, ending up at 9,997 results instead of 9,995,
> > etc.  When you do soft stuff like Google searches, this happens all
> > the time (wait, my blog was on page 1 when I started ...)
> >
> > But if you were searching for credit card transactions, the
> &

Re: Pagination

2009-06-09 Thread Dustin N. Jenkins
Hi Rob,

Thank you very much for the detailed post.  It's very useful.

My Persistence Layer uses Hibernate, which in turn uses ehcache as the 
Second Layer cache, but I've always had it turned off, so now would be a 
good time to experiment with it I suppose.

A stable search result is not required in my case, and I would happily 
go back to the Persistence Layer each time as I deal with Scientific 
results that are updated all the time.  A user wouldn't necessarily get 
lost while moving from page to page.

In reference to Josh's solution, I really like the idea of going by the 
sorted results and asking for the data after the last known item.  I 
deal with a multi-field form; upwards of a dozen fields to search on, so 
passing data back and forth may not be viable all the time, especially 
with a GET given the known character limitation.  However, do users 
commonly bookmark a search result with a page number?  I could 
definitely see it.  Perhaps the bookmark would encapsulate the AFTER 
clause in the URL.

Thanks again, Rob.  It is an interesting problem.
Dustin


Rob Heittman wrote:
> Ah, pagination.  One of the great programming tradeoffs  :-)  Have a 
> look at this comment thread from Ohloh a while back.
>
> http://www.ohloh.net/forums/3491/topics/1056
>
> Josh Triplett proposes a good solution that is lightweight for paging 
> non-critical data without server state.  
>
> You can guarantee a stable search result for the duration of the 
> browse by caching the entire result set server side and providing a 
> means of moving through it ... that might scale to hundreds of 
> results, but not so much to millions.  Still, that's the usual Session 
> idiom.
>
> Here's what I usually do ... send an HTML or XML representation with 
> sufficient information about how to repeat the search and page thru 
> it, but keep no server side state per se.  I just make sure the data 
> layer is smart enough about caching result sets to avoid unnecessary work.
>
> Example: say I am exposing a fulltext search over a collection of 
> 10,000 documents and someone searches on "the".
>
> Client hits Resource (stateless, short lived) by POST to /search with 
> something like
>
> the
>
> Resource submits the search to a query service.
> Query service hits fulltext index and gets 9,995 hits.  Caches this 
> result, "the"="{set of 9,995 hits}"
> Resource consumes first 10 results from query service.  Sends 
> something like:
>
>  9995
>  1
>  10
>  The first of many
>  ...
>  The tenth of many
>
> Let's say client wants the next page of results.  It immediately sends 
> back:
>
>  the
>  11
>
> Resource asks query service for "the" again.
> Query service fetches "the"="{set of 9,995 hits}" out of cache.
> Resource consumes results 11-20 from query service ... etc.
>
> This approach is not guaranteed stable.  If the result set expires 
> from cache and also changes in between paginated queries you might end 
> up missing some results, ending up at 9,997 results instead of 9,995, 
> etc.  When you do soft stuff like Google searches, this happens all 
> the time (wait, my blog was on page 1 when I started ...)
>
> But if you were searching for credit card transactions, the 
> instability would be unacceptable.  Here, your search result set must 
> be stored uniquely and guaranteed to iterate forward and backward to 
> the same conclusion until the user goes away from it.
>
> So instead of just repeating the search for "the" I would need to 
> return a unique key for the result set, that the server saved for me:
>
> Client hits Resource (stateless, short lived) by POST to /search with 
> something like
>
> the
>
> Resource submits the search to a query service.
> Query service hits fulltext index and gets 9,995 hits.  Creates a new 
> stable ID, "abcdefg123."  Caches this result, "abcdefg123"="{set of 
> 9,995 hits}"
> Resource consumes first 10 results from query service.  Sends 
> something like:
>
>  abcdefg123
>  9995
>  1
>  10
>  The first of many
>  ...
>  The tenth of many
>
> Let's say client wants the next page of results.  It immediately sends 
> back:
>
>  the 
>  abcdefg123
>  11
>
> Resource asks query service for "abcdefg123" again.
> Query service fetches "abcdefg123"="{set of 9,995 hits}" out of cache.
> Resource consumes results 11-20 from query service ... etc.
>
> Here, if a second user queries on "the" they would get 

Re: Pagination

2009-06-05 Thread Rob Heittman
Ah, pagination.  One of the great programming tradeoffs  :-)  Have a look at
this comment thread from Ohloh a while back.

http://www.ohloh.net/forums/3491/topics/1056

Josh Triplett proposes a good solution that is lightweight for paging
non-critical data without server state.

You can guarantee a stable search result for the duration of the browse by
caching the entire result set server side and providing a means of moving
through it ... that might scale to hundreds of results, but not so much to
millions.  Still, that's the usual Session idiom.

Here's what I usually do ... send an HTML or XML representation with
sufficient information about how to repeat the search and page thru it, but
keep no server side state per se.  I just make sure the data layer is smart
enough about caching result sets to avoid unnecessary work.

Example: say I am exposing a fulltext search over a collection of 10,000
documents and someone searches on "the".

Client hits Resource (stateless, short lived) by POST to /search with
something like
   
the
   
Resource submits the search to a query service.
Query service hits fulltext index and gets 9,995 hits.  Caches this result,
"the"="{set of 9,995 hits}"
Resource consumes first 10 results from query service.  Sends something
like:
   
 9995
 1
 10
 The first of many
 ...
 The tenth of many
   
Let's say client wants the next page of results.  It immediately sends back:
   
 the
 11
   
Resource asks query service for "the" again.
Query service fetches "the"="{set of 9,995 hits}" out of cache.
Resource consumes results 11-20 from query service ... etc.

This approach is not guaranteed stable.  If the result set expires from
cache and also changes in between paginated queries you might end up missing
some results, ending up at 9,997 results instead of 9,995, etc.  When you do
soft stuff like Google searches, this happens all the time (wait, my blog
was on page 1 when I started ...)

But if you were searching for credit card transactions, the instability
would be unacceptable.  Here, your search result set must be stored uniquely
and guaranteed to iterate forward and backward to the same conclusion until
the user goes away from it.

So instead of just repeating the search for "the" I would need to return a
unique key for the result set, that the server saved for me:

Client hits Resource (stateless, short lived) by POST to /search with
something like
   
the
   
Resource submits the search to a query service.
Query service hits fulltext index and gets 9,995 hits.  Creates a new stable
ID, "abcdefg123."  Caches this result, "abcdefg123"="{set of 9,995 hits}"
Resource consumes first 10 results from query service.  Sends something
like:
   
 abcdefg123
 9995
 1
 10
 The first of many
 ...
 The tenth of many
   
Let's say client wants the next page of results.  It immediately sends back:
   
 the 
 abcdefg123
 11
   
Resource asks query service for "abcdefg123" again.
Query service fetches "abcdefg123"="{set of 9,995 hits}" out of cache.
Resource consumes results 11-20 from query service ... etc.

Here, if a second user queries on "the" they would get their own guaranteed
stable result set, unless you can be really smart about knowing which result
sets are identical and can share an ID.

If the client waits too long before changing pages, and abcdefg123 goes out
of cache, the server can either return some sort of error, or repeat the
search and send back the response with some kind of flag to indicate that
the result set has changed.  (I like this last behavior, along with the
above "really smart about knowing which result sets are identical")

I like ehcache a lot for all this.  I can trivially implement memory
sensitive caches with disk backing to hold server side resources
representing result sets and other goodies.  This can be done very close to
the Representation level to avoid duplicative work, and the general usage
doesn't vary much between different kinds of data layers -- relational
queries, Web service queries, Lucene queries, XML document searches, etc.

Finally, if you're dealing with "dumb" HTML clients (that work page by page
and can't be bothered to keep state like what they queried for in the first
place), instead of the XML example above, you can incorporate a form that
repeats the search (with the appropriate pagination) directly into your HTML
result.  This isn't much of a problem with AJAX or desktop clients that can
maintain their own state better.

Any of that useful?

- R

On Fri, Jun 5, 2009 at 2:14 PM, Dustin N. Jenkins <
dustin.jenk...@nrc-cnrc.gc.ca> wrote:

> I'd like to be able to paginate my search results in my Restlet Web
> Application as the user 

Pagination

2009-06-05 Thread Dustin N. Jenkins
I'm using the 2.0M3 version of Restlet with JDK 1.6 in a Fedora Core 8 
environment.  My View Layer uses the included FreeMarker.

I'd like to be able to paginate my search results in my Restlet Web 
Application as the user can easily return hundreds of results.  I've 
searched around and came across the Value List Holder Design Pattern, 
but I'm not sure it meets my needs.

The fundamental problem I see is caching the entire result set 
somewhere.  The alternative is to hit the database each time, which in 
my case is plausible since Hibernate has built-in pagination, but then 
I'm relying on my Persistence Layer to do pagination.

Has anyone successfully done RESTful pagination in Restlet, or Java in 
general?  I know Ruby on Rails has a way of caching it, but I don't know 
if it's the same as, say, a Session would do it in J2EE.

Many thanks!
Dustin

-- 


Dustin N. Jenkins | Tel/Tél: 250.363.3101 | dustin.jenk...@nrc-cnrc.gc.ca

facsimile/télécopieur: (250) 363-0045

National Research Council Canada | 5071 West Saanich Rd, Victoria BC. 
V9E 2E7

Conseil national de recherches Canada | 5071, ch. West Saanich, Victoria 
(C.-B) V9E 2E7

Government of Canada | Gouvernement du Canada

--
http://restlet.tigris.org/ds/viewMessage.do?dsForumId=4447&dsMessageId=2359784