I have 194 tablets. Currently I am using 20 threads to create the batchscanner inside the createBatchScanner method. On 12-May-2015 11:19 pm, "Keith Turner" <[email protected]> wrote:
> How many tablets do you have? The batch scanner does not parallelize > operations within a tablet. > > If you give the batch scanner more threads than there are tservers, it > will make multilple parallel rpc calls to each tserver if the tserver has > multiple tablets. Each rpc may include multiple tablets and ranges for > each tablet. > > If the batch scanner has less threads than tservers, it will make one rpc > per tserver per thread. Each rpc call will include all tablets and > associated ranges for that tserver. > > Keith > > > > On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal < > [email protected]> wrote: > >> Hi, >> >> I am using BatchScanner to scan rows from a accumulo table. The table has >> around 187m entries and I am using a 3 node cluster which has accumulo >> 1.6.1. >> >> I have passed 10000 ids which are stored as row id in my table as a list >> in the setRanges() method. >> >> This whole process takes around 50 secs(from adding the ids in the list >> to scanning the whole table using the BatchScanner). >> >> I tried switching on bloom filters but that didn't work. >> >> Also if anyone could briefly explain how a BatchScanner works, how it >> does parallel scanning it would help me understand what I am doing better. >> >> Thanks >> Vaibhav >> >> >> >
