Do you know how much data is being brought back (i.e. 100 megabytes)? I am wondering what the data rate is in MB/s. Do you know how many files per tablet you have? Do most of the 10,000 ids you are querying for exist?
On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal < [email protected]> wrote: > I have 194 tablets. Currently I am using 20 threads to create the > batchscanner inside the createBatchScanner method. > On 12-May-2015 11:19 pm, "Keith Turner" <[email protected]> wrote: > >> How many tablets do you have? The batch scanner does not parallelize >> operations within a tablet. >> >> If you give the batch scanner more threads than there are tservers, it >> will make multilple parallel rpc calls to each tserver if the tserver has >> multiple tablets. Each rpc may include multiple tablets and ranges for >> each tablet. >> >> If the batch scanner has less threads than tservers, it will make one rpc >> per tserver per thread. Each rpc call will include all tablets and >> associated ranges for that tserver. >> >> Keith >> >> >> >> On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal < >> [email protected]> wrote: >> >>> Hi, >>> >>> I am using BatchScanner to scan rows from a accumulo table. The table >>> has around 187m entries and I am using a 3 node cluster which has accumulo >>> 1.6.1. >>> >>> I have passed 10000 ids which are stored as row id in my table as a list >>> in the setRanges() method. >>> >>> This whole process takes around 50 secs(from adding the ids in the list >>> to scanning the whole table using the BatchScanner). >>> >>> I tried switching on bloom filters but that didn't work. >>> >>> Also if anyone could briefly explain how a BatchScanner works, how it >>> does parallel scanning it would help me understand what I am doing better. >>> >>> Thanks >>> Vaibhav >>> >>> >>> >>
