Re: Accumulo Utilities

Keith Turner Thu, 28 Mar 2013 08:56:08 -0700

I took a quick look at the code. Excluding the threading issue, a
major conceptual difference is that BatchScannerWithScanners seems to
do a RPC round trip for each range.   The TabletServerBatchReader
sends all of the ranges that a tablet server needs to lookup in one
RPC.

Instead of creating a BatchScannerWithScanners, maybe you could create
a batch scanner with just one thread when resources are exceeded?
This will be similar to what you are doing now, just one thread will
be doing work fetching data.  The client thread would just be waiting
on this background thread.   Although this does allow the processing
of result to happen concurrently with fetching of data.  Using
BatchScannerWithScanners would not allow this.

Something to be aware of, the regular scanner will spin up a read
ahead thread if you read a lot of data through it.  It does not do
this immediately, only after fetching a few batches of key value pairs
from the tablet server.  If this happens you could have one thread
fetching data while the client thread processes results.

Do you think we should open a a ticket about giving users control over
threads created by client code?    Maybe users could pass in their own
thread pool to a batch scanner?

Keith

On Thu, Mar 28, 2013 at 11:00 AM,  <[email protected]> wrote:
> In some of my projects, we needed to control the number of threads spun up 
> with the use of multiple batch scanners. We created a utility to control the 
> number of threads, and if the max threads has been reached, return a batch 
> scanner that is actually backed by Scanners. Wanted to get any feedback on 
> the code. Seems like such a simple thing to do, I bet someone already has 
> this. Thanks!
>
> https://github.com/calrissian/mango/tree/master/accumulo

Re: Accumulo Utilities

Reply via email to