I took a quick look at the code. Excluding the threading issue, a major conceptual difference is that BatchScannerWithScanners seems to do a RPC round trip for each range. The TabletServerBatchReader sends all of the ranges that a tablet server needs to lookup in one RPC.
Instead of creating a BatchScannerWithScanners, maybe you could create a batch scanner with just one thread when resources are exceeded? This will be similar to what you are doing now, just one thread will be doing work fetching data. The client thread would just be waiting on this background thread. Although this does allow the processing of result to happen concurrently with fetching of data. Using BatchScannerWithScanners would not allow this. Something to be aware of, the regular scanner will spin up a read ahead thread if you read a lot of data through it. It does not do this immediately, only after fetching a few batches of key value pairs from the tablet server. If this happens you could have one thread fetching data while the client thread processes results. Do you think we should open a a ticket about giving users control over threads created by client code? Maybe users could pass in their own thread pool to a batch scanner? Keith On Thu, Mar 28, 2013 at 11:00 AM, <[email protected]> wrote: > In some of my projects, we needed to control the number of threads spun up > with the use of multiple batch scanners. We created a utility to control the > number of threads, and if the max threads has been reached, return a batch > scanner that is actually backed by Scanners. Wanted to get any feedback on > the code. Seems like such a simple thing to do, I bet someone already has > this. Thanks! > > https://github.com/calrissian/mango/tree/master/accumulo
