How many tablets do you have? The batch scanner does not parallelize operations within a tablet.
If you give the batch scanner more threads than there are tservers, it will make multilple parallel rpc calls to each tserver if the tserver has multiple tablets. Each rpc may include multiple tablets and ranges for each tablet. If the batch scanner has less threads than tservers, it will make one rpc per tserver per thread. Each rpc call will include all tablets and associated ranges for that tserver. Keith On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal < [email protected]> wrote: > Hi, > > I am using BatchScanner to scan rows from a accumulo table. The table has > around 187m entries and I am using a 3 node cluster which has accumulo > 1.6.1. > > I have passed 10000 ids which are stored as row id in my table as a list > in the setRanges() method. > > This whole process takes around 50 secs(from adding the ids in the list to > scanning the whole table using the BatchScanner). > > I tried switching on bloom filters but that didn't work. > > Also if anyone could briefly explain how a BatchScanner works, how it does > parallel scanning it would help me understand what I am doing better. > > Thanks > Vaibhav > > >
