Unfortunately that’s pretty much what I’m doing now, and the results are large enough that pulling them back and sorting them causes fairly dramatic GC issues. If I could get them in sorted order I no longer need to retain them, I can just process them and discard them eliminating my GC issues. I think the way I’ll end up working around this in the short term is to pull pages of data from a batch scanner, sort those, then combine the paged results. That should be manageable.
Rob Povey From: Keith Turner <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, October 28, 2015 at 8:04 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Is there a sensible way to do this? Sequential Batch Scanner Will the results always fit into memory? If so could put results from batch scanner into ArrayList and sort it. On Tue, Oct 27, 2015 at 6:21 PM, Rob Povey <[email protected]<mailto:[email protected]>> wrote: What I want is something that behaves like a BatchScanner (I.e. Takes a collection of Ranges in a single RPC), but preserves the scan ordering. I understand this would greatly impact performance, but in my case I can manually partition my request on the client, and send one request per tablet. I can’t use scanners, because in some cases I have 10’s of thousands of none consecutive ranges. If I use a single threaded BatchScanner, and only request data from a single Tablet, am I guaranteed ordering? This appears to work correctly in my small tests (albeit slower than a single 1 thread Batch scanner call), but I don’t really want to have to rely on it if the semantic isn’t guaranteed. If not Is there another “efficient” way to do this. Thanks Rob Povey
