Half-baked thoughts from a neo4j newbie hacker type on this topic: 1) I think it is very important, even with modern infrastructures, for the client to be able to optionally throttle the result set it generates with a query as it sees fit, and not just because of client memory and bandwidth limitations.
With regular old SQL databases if you send a careless large query, you can chew up significant system resources, for significant amounts of time while it is being processed. At a minimum, a rowcount/pagination option allows you to build something into your client which can minimize accidental denial of service queries. I'm not sure if it is possible to construct a query against a large Neo4j database that would temporarily cripple it, but it wouldn't surprise me if you could. 2) Sometimes with regular old SQL databases I'll run a sanity check "count()" function with the query to just return the size of the expected result set before I try to pull it back into my data structure. Many times "count()" is all I needed anyhow. Does Neo4j have a result set size function? Perhaps a client that really could only handle small result sets could run a count(), and then filter the search somehow, if necessary, until the count() was smaller? (I guess it would depend on the problem domain...) In other words it may be possible, when it is really important, to implement pagination logic on the client side, if you don't mind running multiple queries for each set of data you get back. 3) If the result set was broken into pages, you could organize the pages in the server with a set of [temporary] graph nodes with relationships to the results in the database -- one node for each page, and a parent node for the result set. If order of the pages is important, you could add directed relationships between the page nodes. If the order within the pages is important you could either apply a sequence numbering to the page-result relationship, or add directed temporary result set directed relationships too. Subsequent page retrievals would be new traversals based on the search result set graph. In a sense you would be building a temporary graph-index I suppose. And advantage to organizing search result sets this way is that you could then union and intersect result sets (and do other set operations) without a huge memory overhead. (Which means you could probably store millions of search results at one time, and you could persist them through restarts.) 4) In some HA architectures you may have multiple database copies behind a load balancer. Would the search result pages be stored equally on all of them? Would the client require a "sticky" flag, to always go back to the same specific server instance for more pages? Depending on how fast writes get propagated across the cluster (compared to requests for the next page), if you were creating nodes as described in (3) would that work? 5) As for sorting: In my experience, if I need a result set sorted from a regular SQL database, I will usually sort it myself. Most databases I've ever worked with routinely have performance problems. You can minimize finger pointing and the risk of complicating those other performance problems by just directing the database to get me what I need, I'll do the rest of it back in the client. On the other hand, sometimes it is quicker and easier to let the database do the work. (Usually when I can only handle the data in small chunks on the client.) What I'm trying to say, is that I think sorting is going to be more important to clients who want paginated results (ie, using resource limited clients), than to clients who are grabbing large chunks of data at a time (and will want to "own" any post-query processing steps anyhow). -- Rick Otten rot...@windfish.net O=='=+ _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user