Thanks! I like the idea of sending my own thread pool to the batch scanner, 
that would definitely be the better solution.

Yeah I thought about creating a batch scanner with only one thread, but I was 
not sure if that is making a separate thread (outside of the current one) or 
using the current one. At the time I did not want a new thread to be created at 
all. Though, didn't realize the Scanner was also spinning up a thread at all, 
thought that was in process. 

To mitigate the separate RPC call per range, would it make more sense to do a 
"binRanges" based on the ranges at the tablets to reduce the number of ranges?

On Mar 28, 2013, at 11:55 AM, Keith Turner <[email protected]> wrote:

> I took a quick look at the code. Excluding the threading issue, a
> major conceptual difference is that BatchScannerWithScanners seems to
> do a RPC round trip for each range.   The TabletServerBatchReader
> sends all of the ranges that a tablet server needs to lookup in one
> RPC.
> 
> Instead of creating a BatchScannerWithScanners, maybe you could create
> a batch scanner with just one thread when resources are exceeded?
> This will be similar to what you are doing now, just one thread will
> be doing work fetching data.  The client thread would just be waiting
> on this background thread.   Although this does allow the processing
> of result to happen concurrently with fetching of data.  Using
> BatchScannerWithScanners would not allow this.
> 
> Something to be aware of, the regular scanner will spin up a read
> ahead thread if you read a lot of data through it.  It does not do
> this immediately, only after fetching a few batches of key value pairs
> from the tablet server.  If this happens you could have one thread
> fetching data while the client thread processes results.
> 
> Do you think we should open a a ticket about giving users control over
> threads created by client code?    Maybe users could pass in their own
> thread pool to a batch scanner?
> 
> 
> Keith
> 
> On Thu, Mar 28, 2013 at 11:00 AM,  <[email protected]> wrote:
>> In some of my projects, we needed to control the number of threads spun up 
>> with the use of multiple batch scanners. We created a utility to control the 
>> number of threads, and if the max threads has been reached, return a batch 
>> scanner that is actually backed by Scanners. Wanted to get any feedback on 
>> the code. Seems like such a simple thing to do, I bet someone already has 
>> this. Thanks!
>> 
>> https://github.com/calrissian/mango/tree/master/accumulo

Reply via email to