Thanks! I like the idea of sending my own thread pool to the batch scanner, that would definitely be the better solution.
Yeah I thought about creating a batch scanner with only one thread, but I was not sure if that is making a separate thread (outside of the current one) or using the current one. At the time I did not want a new thread to be created at all. Though, didn't realize the Scanner was also spinning up a thread at all, thought that was in process. To mitigate the separate RPC call per range, would it make more sense to do a "binRanges" based on the ranges at the tablets to reduce the number of ranges? On Mar 28, 2013, at 11:55 AM, Keith Turner <[email protected]> wrote: > I took a quick look at the code. Excluding the threading issue, a > major conceptual difference is that BatchScannerWithScanners seems to > do a RPC round trip for each range. The TabletServerBatchReader > sends all of the ranges that a tablet server needs to lookup in one > RPC. > > Instead of creating a BatchScannerWithScanners, maybe you could create > a batch scanner with just one thread when resources are exceeded? > This will be similar to what you are doing now, just one thread will > be doing work fetching data. The client thread would just be waiting > on this background thread. Although this does allow the processing > of result to happen concurrently with fetching of data. Using > BatchScannerWithScanners would not allow this. > > Something to be aware of, the regular scanner will spin up a read > ahead thread if you read a lot of data through it. It does not do > this immediately, only after fetching a few batches of key value pairs > from the tablet server. If this happens you could have one thread > fetching data while the client thread processes results. > > Do you think we should open a a ticket about giving users control over > threads created by client code? Maybe users could pass in their own > thread pool to a batch scanner? > > > Keith > > On Thu, Mar 28, 2013 at 11:00 AM, <[email protected]> wrote: >> In some of my projects, we needed to control the number of threads spun up >> with the use of multiple batch scanners. We created a utility to control the >> number of threads, and if the max threads has been reached, return a batch >> scanner that is actually backed by Scanners. Wanted to get any feedback on >> the code. Seems like such a simple thing to do, I bet someone already has >> this. Thanks! >> >> https://github.com/calrissian/mango/tree/master/accumulo
