In Hadoop under the mapred-site.conf I can set the maximum number of mappers. For the sake of this email I will call the number of concurrent mappers: mapper slots.
Is it possible to figure out from within the mapper which mapper slot it is running in? On this project this is important because each mapper has to fork off a Matlab runtime compiled executable. The executable is passed in at runtime a cache to work in. Setting up the cache when given an new directory takes a long time but can be used again quickly on future calls if provided the same location of the cache. As it turns out when multiple mappers try to use the same cache they crash the executable. So ideally if I could identify which mapper slot a mapper is running in, I can setup caches for each slot and avoid the cache creation time and still guarantee that no two mappers write to the same cache. Thanks for taking the time to read this, Sandy
