It's not just a matter of transferring the data from the reducer to the region server, you have to take into account that that data is also replicated to other nodes.
So in a suboptimal setup you have: Reducer -> Network -> RegionServer -> Local Datanode -> Network -> Remote Datanode1 -> Network -> Remote Datanode2 What you are trying to get is: Reducer -> Local RegionServer -> Local Datanode -> Network -> Remote Datanode1 -> Network -> Remote Datanode2 Subsequent flushes of the inserted data will also follow the latest pattern. That's what I meant earlier when I said the gain would be marginal, you're only saving one network trip among many others. Also I took a look at the JobTracker code and modifying it doesn't look so easy. Instead, since you already use the HRegionPartionioner, why don't you do an incremental bulk load? http://hbase.apache.org/bulk-loads.html J-D On Wed, Apr 13, 2011 at 7:49 AM, Biedermann,S.,Fa. Post Direkt <[email protected]> wrote: > Hi Jean-Daniel, > > thx for your reply. > > What I assume is that the total network load during reduce is O(n) with n the > number of nodes in the cluster. We saw a major performance loss in the reduce > step when our network degraded to 100Mbit by accident (1h vs. 13 minutes). > > With more nodes I see 2 options: > > 1) using switches with a higher switching capacity > 2) improve hbase/hadoop's assignment of reduce task to those nodes which > serve the corresponding hbase regions. > > What do you think? > > Sven > > -----Ursprüngliche Nachricht----- > Von: [email protected] [mailto:[email protected]] Im Auftrag von > Jean-Daniel Cryans > Gesendet: Freitag, 8. April 2011 18:04 > An: [email protected] > Betreff: Re: data locality for reducer writes? > > Unfortunately it seems that there's nothing in the OutputFormat > interface that we could implement (like getSplits in the InputFormat) > to inform the JobTracker of the location of the regions. It kinda make > sense, since when you're writing to HDFS in a "normal" MR job you > always write to the local DataNode (well if there's one), but even > then it is replicated to two other nodes. IMO even if we had that the > gain would be marginal. > > J-D > > On Fri, Apr 8, 2011 at 4:18 AM, Biedermann,S.,Fa. Post Direkt > <[email protected]> wrote: >> Hi, >> >> >> >> we have a number of Reducer task each writing a bunch of rows into the >> latest HBase via Puts. >> >> What is working is that each Reducer only creates Puts for one single >> Region by using HRegionPartionioner. >> >> >> >> However, we are seeing that the Region flush itself is not local, but >> going to some other node in the cluster. This puts load on the network. >> >> We'd like to see that instead the Reducer would be run on the same node >> where the region is served. >> >> >> >> Is that possible? >> >> Any ideas or suggestions? >> >> >> >> Sven >> >> >
