The rf files per tablet vary between 2 to 5 per tablet. The entries returned to me by the batchScanner is 460000. The approx. average data rate is 0.5 MB/s as seen on the accumulo monitor page.
A simple scan on the table has an average data rate of about 7-8 MB/s. All the ids exist in the accumulo table. On 12 May 2015 at 23:39, Keith Turner <[email protected]> wrote: > Do you know how much data is being brought back (i.e. 100 megabytes)? I am > wondering what the data rate is in MB/s. Do you know how many files per > tablet you have? Do most of the 10,000 ids you are querying for exist? > > On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal < > [email protected]> wrote: > >> I have 194 tablets. Currently I am using 20 threads to create the >> batchscanner inside the createBatchScanner method. >> On 12-May-2015 11:19 pm, "Keith Turner" <[email protected]> wrote: >> >>> How many tablets do you have? The batch scanner does not parallelize >>> operations within a tablet. >>> >>> If you give the batch scanner more threads than there are tservers, it >>> will make multilple parallel rpc calls to each tserver if the tserver has >>> multiple tablets. Each rpc may include multiple tablets and ranges for >>> each tablet. >>> >>> If the batch scanner has less threads than tservers, it will make one >>> rpc per tserver per thread. Each rpc call will include all tablets and >>> associated ranges for that tserver. >>> >>> Keith >>> >>> >>> >>> On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I am using BatchScanner to scan rows from a accumulo table. The table >>>> has around 187m entries and I am using a 3 node cluster which has accumulo >>>> 1.6.1. >>>> >>>> I have passed 10000 ids which are stored as row id in my table as a >>>> list in the setRanges() method. >>>> >>>> This whole process takes around 50 secs(from adding the ids in the list >>>> to scanning the whole table using the BatchScanner). >>>> >>>> I tried switching on bloom filters but that didn't work. >>>> >>>> Also if anyone could briefly explain how a BatchScanner works, how it >>>> does parallel scanning it would help me understand what I am doing better. >>>> >>>> Thanks >>>> Vaibhav >>>> >>>> >>>> >>> >
