In Hadoop under the mapred-site.conf  I can set the maximum number of mappers. 
For the sake of this email I will call the number of concurrent mappers: mapper 
slots.

Is it possible to figure out from within the mapper which mapper slot it is 
running in?

On this project this is important because each mapper has to fork off a Matlab 
runtime compiled executable.  The executable is passed in at runtime a cache to 
work in.  Setting up the cache when given an new directory takes a long time 
but can be used again quickly on future calls if provided the same location of 
the cache.   As it turns out when multiple mappers try to use the same cache 
they crash the executable.   So ideally if I could identify which mapper slot a 
mapper is running in, I can setup caches for each slot and avoid the cache 
creation time and still guarantee that no two mappers write to the same cache.

Thanks for taking the time to read this,

Sandy


Reply via email to