Hi Jeremy, According to official documentation <http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html> setup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once.
-- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: > Let's say I have TaskTracker that receives 5 records to process for a > single job. When the TaskTracker processses the first record, it will > instantiate my Mapper class and execute my setup() function. It will > then run the map() method on that record. My question is this : what > happens when the map() method has finished processing the first > record? I'm guessing it will do one of two things : > > 1) My cleanup() function will execute. After the cleanup() method has > finished, this instance of the Mapper object will be destroyed. When > it is time to process the next record, a new Mapper object will be > instantiated. Then my setup() method will execute, the map() method > will execute, the cleanup() method will execute, and then the Mapper > instance will be destroyed. When it is time to process the next > record, a new Mapper object will be instantiated. This process will > repeat itself until all 5 records have been processed. In other > words, my setup() and cleanup() methods will have been executed 5 > times each. > > or > > 2) When the map() method has finished processing my first record, the > Mapper instance will NOT be destroyed. It will be reused for all 5 > records. When the map() method has finished processing the last > record, my cleanup() method will execute. In other words, my setup() > and cleanup() methods will only execute 1 time each. > > Thanks for the help!
signature.asc
Description: OpenPGP digital signature
