Thanks for answering my question with not only the answer, but also detailed description. :-)
regards, Lin On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <[email protected]> wrote: > A reduce can't process the complete data set until it has fetched all > partitions. And any map may produce a partition for any reducer. > Hence, we generally wait before all maps have terminated, and their > partition outputs ready and copied over to reduces, before we begin to > group and process the keys. > > However, given that you began thinking about this, this paper on > "Online" Hadoop may interest you: > http://www.neilconway.org/docs/nsdi2010_hop.pdf > > On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <[email protected]> wrote: > > Hi guys, > > > > Supposing in a Hadoop job, there are both mappers and reducers. My > question > > is, reducer tasks cannot begin until all mapper tasks complete? If so, > why > > designed in this way? > > > > thanks in advance, > > Lin > > > > -- > Harsh J >
