Thanks Rohith, My question regarding this is on the Reducer side, not related with Combiner( which happens on the mapper node).
When all mappers' output key/value pairs shuffle to the reduer nodes, , three things should be done. 1. Merge mapper' output key/value pairs from all the mapper nodes together. 2. The key/value pairs are sorted by the key 3. All the values of the same key will form an iterative collection into a format like <key, value1,value2,value3...> My question is who takes this responsibiltiy to form this iterative collection? Thanks. [email protected] From: Rohith Sharma K S Date: 2014-12-22 12:23 To: [email protected] Subject: RE: Question about shuffle/merge/sort phrase whose responsibility is it that brings each key with all its values together >> You can set combiner class in your job. For more information , refer http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html Thanks & Regards Rohith Sharma K S This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! From: Todd [mailto:[email protected]] Sent: 21 December 2014 19:29 To: [email protected] Subject: Question about shuffle/merge/sort phrase Hi, Hadoopers, I got a question about shuffle/sort/merge phrase related.. My understanding is that shuffle is used to transfer the mapper output(key/value pairs) from mapper node to reducer node, and merge phrase is used to merge all the mapper output from all mapper nodes, and sort phrase is used to sort the key/value pair by key, Then my question, whose responsibility is it that brings each key with all its values together (The reducer's input is a key and an iterative values). Thanks.
