Unfortunately it seems that there's nothing in the OutputFormat interface that we could implement (like getSplits in the InputFormat) to inform the JobTracker of the location of the regions. It kinda make sense, since when you're writing to HDFS in a "normal" MR job you always write to the local DataNode (well if there's one), but even then it is replicated to two other nodes. IMO even if we had that the gain would be marginal.
J-D On Fri, Apr 8, 2011 at 4:18 AM, Biedermann,S.,Fa. Post Direkt <[email protected]> wrote: > Hi, > > > > we have a number of Reducer task each writing a bunch of rows into the > latest HBase via Puts. > > What is working is that each Reducer only creates Puts for one single > Region by using HRegionPartionioner. > > > > However, we are seeing that the Region flush itself is not local, but > going to some other node in the cluster. This puts load on the network. > > We'd like to see that instead the Reducer would be run on the same node > where the region is served. > > > > Is that possible? > > Any ideas or suggestions? > > > > Sven > >
