Hi, When running a job with more reducers than containers available in the cluster all reducers get scheduled, leaving no containers available for the mappers to be scheduled. The result is starvation and the job never finishes. Is this to be considered a bug or is it expected behavior? The workaround is to limit the number of reducers to less than the number of containers available.
Also, it seems that from the combined pool of pending map and reduce tasks, randomly tasks are picked and scheduled. This causes less than optimal behavior. For example, I run a task with 500 mappers and 30 reducers (my cluster has only 16 machines, two containters per machine (duo core machines)). What I observe is that half way through the job all reduce tasks are scheduled, leaving only one container for 200+ map tasks. Again, is this expected behavior? If so, what is the idea behind it? And, are the map and reduce task indeed randomly scheduled or does it only look like they are? Any advice is welcome. Regards, Vasco
