hi all,
someone PM'ed me suggesting i'd take a look in the input split setting,
and indeed, the splitsize is determining the number of tasks
stijn
On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
It's my understanding that you don't get map tasks as such but containers.
My experience is with version 2 +
And if that's true containers are based on memory tuning in mapred-site.xml
Otherwise I'd love to learn more.
Sent from my iPhone
On 27 Aug 2014, at 12:14, Stijn De Weirdt <[email protected]> wrote:
hi all,
we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs)
using terasort and one of the main issue we are seeing is that an avg map task
takes < 15sec. some tuning guides and websites suggest that ideally map tasks
run between 40sec to 1 or 2 minutes.
(however, it's also not very clear if the recommendations are still valid for
yarn)
in particluar, we see way more map tasks then expected, and we are wondering
how the number of map tasks per job run is determined.
teragen created 64 output files, we are only expecting 64 map tasks, each
processing one input file. however, we see something like 3000 tasks
hints are much appreciated
stijn