> call context.write() in my mapper class)? If not, are there any other > MR platforms that can do this? I've been searching around and couldn't
You can use Hama BSP[1] instead of Map/Reduce. No stable release yet but I confirmed that large graph with billions of nodes and edges can be crunched in few minutes[2]. 1. http://hama.apache.org 2. http://wiki.apache.org/hama/Benchmarks On Sat, Oct 6, 2012 at 1:31 AM, Jim Twensky <[email protected]> wrote: > Hi, > > I have a complex Hadoop job that iterates over large graph data > multiple times until some convergence condition is met. I know that > the map output goes to the local disk of each particular mapper first, > and then fetched by the reducers before the reduce tasks start. I can > see that this is an overhead, and it theory we can ship the data > directly from mappers to reducers, without serializing on the local > disk first. I understand that this step is necessary for fault > tolerance and it is an essential building block of MapReduce. > > In my application, the map process consists of identity mappers which > read the input from HDFS and ship it to reducers. Essentially, what I > am doing is applying chains of reduce jobs until the algorithm > converges. My question is, can I bypass the serialization of the local > data and ship it from mappers to reducers immediately (as soon as I > call context.write() in my mapper class)? If not, are there any other > MR platforms that can do this? I've been searching around and couldn't > see anything similar to what I need. Hadoop On Line is a prototype and > has some similar functionality but it hasn't been updated for a while. > > Note: I know about ChainMapper and ChainReducer classes but I don't > want to chain multiple mappers in the same local node. I want to chain > multiple reduce functions globally so the data flow looks like: Map -> > Reduce -> Reduce -> Reduce, which means each reduce operation is > followed by a shuffle and sort essentially bypassing the map > operation. -- Best Regards, Edward J. Yoon @eddieyoon
