Re: MR Data Locality with AccumuloInputFormat?

Josh Elser Fri, 16 May 2014 14:19:20 -0700

Hi Russ,

I believe that the AccumuloInputFormat will use the splits on the tableyou're reading to generate the MR InputSplits. The InputFormat should betrying to run the Mappers on the same machine as the tserver serving thedata is located.

If you're only getting a few mappers, adding more splits to your tableshould help. As your job runs, you can verify locality using thecounters that your Job creates using the JobTracker/ResourceManger web UI.


On 5/16/14, 1:32 PM, Russ Weeks wrote:

Hi, folks,

When I execute an MR job with AccumuloInputFormat, are there any
guarantees about which mappers process which rows? I'm trying to
minimize crosstalk in my cluster but either I haven't split my table
properly or I'm expecting too much, because I'm only seeing 1 or 2 nodes
running MR tasks that should be reading data from tablet servers on 8
different nodes.

Thanks,
-Russ

Re: MR Data Locality with AccumuloInputFormat?

Reply via email to