Hi Jeremy,

According to official documentation
<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>
setup and cleanup calls performed for each InputSplit. In this case you
variant 2 is more correct. But actually single mapper can be used for
processing multiple InputSplits. In you case if you have 5 files with 1
record each it can call setup/cleanup 5 times. But if your records are
in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey

On 06/05/14 02:49, jeremy p wrote:
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will
> then run the map() method on that record.  My question is this : what
> happens when the map() method has finished processing the first
> record?  I'm guessing it will do one of two things :
>
> 1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When
> it is time to process the next record, a new Mapper object will be
> instantiated.  Then my setup() method will execute, the map() method
> will execute, the cleanup() method will execute, and then the Mapper
> instance will be destroyed.  When it is time to process the next
> record, a new Mapper object will be instantiated.  This process will
> repeat itself until all 5 records have been processed.  In other
> words, my setup() and cleanup() methods will have been executed 5
> times each.
>
> or
>
> 2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last
> record, my cleanup() method will execute.  In other words, my setup()
> and cleanup() methods will only execute 1 time each.
>
> Thanks for the help!

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to