point 2 is right,The framework first calls setup() followed by map() for each key/value pair in the InputSplit. Finally cleanup() is called irrespective of no of records in the input split.
:::::::::::::::::::::::::::::::::::::::: Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <[email protected]>wrote: > Hi Jeremy, > > According to official > documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup > and cleanup calls performed for each InputSplit. In this case you > variant 2 is more correct. But actually single mapper can be used for > processing multiple InputSplits. In you case if you have 5 files with 1 > record each it can call setup/cleanup 5 times. But if your records are in > single file I think that setup/cleanup should be called once. > > -- > Thanks, > Sergey > > > On 06/05/14 02:49, jeremy p wrote: > > Let's say I have TaskTracker that receives 5 records to process for a > single job. When the TaskTracker processses the first record, it will > instantiate my Mapper class and execute my setup() function. It will then > run the map() method on that record. My question is this : what happens > when the map() method has finished processing the first record? I'm > guessing it will do one of two things : > > 1) My cleanup() function will execute. After the cleanup() method has > finished, this instance of the Mapper object will be destroyed. When it is > time to process the next record, a new Mapper object will be instantiated. > Then my setup() method will execute, the map() method will execute, the > cleanup() method will execute, and then the Mapper instance will be > destroyed. When it is time to process the next record, a new Mapper object > will be instantiated. This process will repeat itself until all 5 records > have been processed. In other words, my setup() and cleanup() methods will > have been executed 5 times each. > > or > > 2) When the map() method has finished processing my first record, the > Mapper instance will NOT be destroyed. It will be reused for all 5 > records. When the map() method has finished processing the last record, my > cleanup() method will execute. In other words, my setup() and cleanup() > methods will only execute 1 time each. > > Thanks for the help! > > >
