What configs control the shuffle phase?

From: Randy Fox
Date: Saturday, January 23, 2016 at 9:53 AM
To: Daniel Haviv
Cc: "[email protected]<mailto:[email protected]>"
Subject: Re: NodeManager High CPU due to high GC

24 virtual cores and we allocated 22 for Yarn

From: Daniel Haviv
Date: Saturday, January 23, 2016 at 4:00 AM
To: Randy Fox
Cc: "[email protected]<mailto:[email protected]>"
Subject: Re: NodeManager High CPU due to high GC

Hi Randy,
How much cores do you have on your machines and how much did you allocate for 
Yarn?

Daniel

On Saturday, 23 January 2016, Randy Fox 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

We just upgraded to using Yarn on Hadoop 2.6.0 – CDH5.4.5
We are running a large job – 200K mappers, 100K reducers and we can’t get 
through the shuffle phase.  The node managers are 800% cpu and high GC.  The 
reducers get socket timouts after 1.5 hours of running and only getting a few 
percent of the data from the mappers.  This job took about 30 hours total 12 in 
mappers on MRv1 with no issues.

I have looked for configs that might help or issues filed and anyone that has 
seen this and I have come up with nothing.
Anyone have ideas on things to try or explain why the node managers are in GC 
hell and why the data is just not flowing from mappers to reducers?

Thanks in advanced,

Randy

Reply via email to