Thank you! This has helped me immensely.
On Tue, May 6, 2014 at 12:47 AM, Raj K Singh <[email protected]> wrote: > point 2 is right,The framework first calls setup() followed by map() for > each key/value pair in the InputSplit. Finally cleanup() is called > irrespective of no of records in the input split. > > :::::::::::::::::::::::::::::::::::::::: > Raj K Singh > http://in.linkedin.com/in/rajkrrsingh > http://www.rajkrrsingh.blogspot.com > Mobile Tel: +91 (0)9899821370 > > > On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev > <[email protected]>wrote: > >> Hi Jeremy, >> >> According to official >> documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup >> and cleanup calls performed for each InputSplit. In this case you >> variant 2 is more correct. But actually single mapper can be used for >> processing multiple InputSplits. In you case if you have 5 files with 1 >> record each it can call setup/cleanup 5 times. But if your records are in >> single file I think that setup/cleanup should be called once. >> >> -- >> Thanks, >> Sergey >> >> >> On 06/05/14 02:49, jeremy p wrote: >> >> Let's say I have TaskTracker that receives 5 records to process for a >> single job. When the TaskTracker processses the first record, it will >> instantiate my Mapper class and execute my setup() function. It will then >> run the map() method on that record. My question is this : what happens >> when the map() method has finished processing the first record? I'm >> guessing it will do one of two things : >> >> 1) My cleanup() function will execute. After the cleanup() method has >> finished, this instance of the Mapper object will be destroyed. When it is >> time to process the next record, a new Mapper object will be instantiated. >> Then my setup() method will execute, the map() method will execute, the >> cleanup() method will execute, and then the Mapper instance will be >> destroyed. When it is time to process the next record, a new Mapper object >> will be instantiated. This process will repeat itself until all 5 records >> have been processed. In other words, my setup() and cleanup() methods will >> have been executed 5 times each. >> >> or >> >> 2) When the map() method has finished processing my first record, the >> Mapper instance will NOT be destroyed. It will be reused for all 5 >> records. When the map() method has finished processing the last record, my >> cleanup() method will execute. In other words, my setup() and cleanup() >> methods will only execute 1 time each. >> >> Thanks for the help! >> >> >> >
