Really cool discussion so far, I would also prefer streaming over paging as with that approach we can give both ends more of the control they need.
The server doesn't have to keep state over a long time (and also implement timeouts and clearing of that state, and keeping that state for lots of clients also adds up). The client can decide how much of the result he's interested in, if it is just 1 entry or 100k and then just drop the connection. Streaming calls can also have a request-timeout, so keeping those open for too long (with no activity) will close them automatically. Server doesn't use up lots of memory for streaming, one could even leverage the lazyness of traversers (and indexes) for not even executing/fetching results that are not going to be sent over the wire. This should accommodate every kind of client from the mobile phone which only lists a few entries, to the big machine that can eat a firehose of result data in milliseconds. For this kind of "look-ahead" support we could (and should) add an possible offset, so that a client can request data (whose order _he_ is sure hasn't changed) by having the server skipping the first n entries (so they don't have to be serialized/put on the wire). I also think that this streaming API could already address many of the pain-points of the current REST API. Perhaps we even want to provide a streaming interface in both directions, having the client being able to for instance stream the creation of nodes and relationships and their indexing without restarting a connection for each operation. Whatever comes in this stream could also be processed in one TX (or with TX tokens embedded in the stream the client could even control that). The only question that is posing here for me is if we want to put it on top of the existing REST API or rather create a more concise API/formats for that (with the later option of the format even degrading to binary for high bandwith interaction). I'd prefer the latter. Cheers Michael Am 21.04.2011 um 21:09 schrieb Rick Bullotta: > Jim, we should schedule a group chat on this topic. > > > > ----- Reply message ----- > From: "Jim Webber" <j...@neotechnology.com> > Date: Thu, Apr 21, 2011 11:01 am > Subject: [Neo4j] REST results pagination > To: "Neo4j user discussions" <user@lists.neo4j.org> > > This is indeed a good dialogue. The pagination versus streaming was something > I'd previously had in my mind as orthogonal issues, but I like the direction > this is going. Let's break it down to fundamentals: > > As a remote client, I want to be just as rich and performant as a local > client. Unfortunately, Deutsch, Amdahl and Einstein are against me on that, > and I don't think I am tough enough to defeat those guys. > > So what are my choices? I know I have to be more "granular" to try to > alleviate some of the network penalty so doing operations bulkily sounds > great. > > Now what I need to decide is whether I control the rate at which those bulk > operations occur or whether the server does. If I want to control those > operations, then paging seems sensible. Otherwise a streamed (chunked) > encoding scheme would make sense if I'm happy for the server to throw results > back at me at its own pace. Or indeed you can mix both so that pages are > streamed. > > In either case if I get bored of those results, I'll stop paging or I'll > terminate the connection. > > So what does this mean for implementation on the server? I guess this is > important since it affects the likelihood of the Neo Tech team implementing > it. > > If the server supports pagination, it means we need a paging controller in > memory per paginated result set being created. If we assume that we'll only > go forward in pages, that's effectively just a wrapper around the traversal > that's been uploaded. The overhead should be modest, and apart from the > paging controller and the traverser, it doesn't need much state. We would > need to add some logic to the representation code to support "next" links, > but that seems a modest task. > > If the server streams, we will need to decouple the representation generation > from the existing representation logic since that builds an in-memory > representation which is then flushed. Instead we'll need a streaming > representation implementation which seems to be a reasonable amount of > engineering. We'll also need a new streaming binding to the REST server in > JAX-RS land. > > I'm still a bit concerned about how "rude" it is for a client to just drop a > streaming connection. I've asked Mark Nottingham for his authoritative > opinion on that. But still, this does seem popular and feasible. > > Jim > > > > > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user