Rob, I would use something like an IteratorChain [1] and fead it Scanner.iterator() objects. If you setReadaheadThreshold(0) on the scanner then calling Scanner.iterator() is a fairly lightweight operation, and you'll be able to plop a bunch of iterators into the IteratorChain so that they are dynamically activated when you're ready for them. If you want higher throughput you will have to do something tricky with readahead thresholds, like writing your own iterator chain and reading ahead on only a few ScannerIterators at a time. You might not need that to get good enough performance, though.
[1] https://commons.apache.org/proper/commons-collections/javadocs/api-2.1.1/org/apache/commons/collections/iterators/IteratorChain.html Adam On Wed, Oct 28, 2015 at 4:00 PM, Rob Povey <[email protected]> wrote: > Unfortunately that’s pretty much what I’m doing now, and the results are > large enough that pulling them back and sorting them causes fairly dramatic > GC issues. > If I could get them in sorted order I no longer need to retain them, I can > just process them and discard them eliminating my GC issues. > I think the way I’ll end up working around this in the short term is to > pull pages of data from a batch scanner, sort those, then combine the paged > results. That should be manageable. > > Rob Povey > > From: Keith Turner <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, October 28, 2015 at 8:04 AM > To: "[email protected]" <[email protected]> > Subject: Re: Is there a sensible way to do this? Sequential Batch Scanner > > Will the results always fit into memory? If so could put results from > batch scanner into ArrayList and sort it. > > On Tue, Oct 27, 2015 at 6:21 PM, Rob Povey <[email protected]> wrote: > >> What I want is something that behaves like a BatchScanner (I.e. Takes a >> collection of Ranges in a single RPC), but preserves the scan ordering. >> I understand this would greatly impact performance, but in my case I can >> manually partition my request on the client, and send one request per >> tablet. >> I can’t use scanners, because in some cases I have 10’s of thousands of >> none consecutive ranges. >> If I use a single threaded BatchScanner, and only request data from a >> single Tablet, am I guaranteed ordering? >> This appears to work correctly in my small tests (albeit slower than a >> single 1 thread Batch scanner call), but I don’t really want to have to >> rely on it if the semantic isn’t guaranteed. >> If not Is there another “efficient” way to do this. >> >> Thanks >> >> Rob Povey >> >> >
