Re: YARN cluster underutilization

Shubh hadoopExp Sun, 19 Jun 2016 10:14:02 -0700

Hello,


Just wanted to know how where you measuring the CPU utilization across the 
cluster. Were you using the CPU time spent parameter??

-Shubh

> On May 28, 2016, at 4:20 AM, Shubh hadoopExp <[email protected]> wrote:
> 
> Hey 
> 
> Thats pretty good. So by changing the file split size, the number of Maps 
> running reduced??
> 
> -Shubh
> 
> 
>> On May 27, 2016, at 4:01 PM, Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi, all.
>>  
>> Just wanted to provide an update, which is that I’m finally getting good 
>> YARN cluster utilization (consistently within the 90-100% range!).  I 
>> believe the biggest change was to increase the min split size.  Since our 
>> input is all in S3 and data locality is not really an issue, I bumped it up 
>> to 2G to minimize the impact of allocation/deallocation of container 
>> resources, since each container will be up working for longer, so that now 
>> occurs less frequently. 
>>  
>>   
>> <property><name>mapreduce.input.fileinputformat.split.minsize</name><value>2147483648</value><!--
>>  2G --></property>
>>  
>> Not sure how much impact the following changes had, since they were made at 
>> the same time.  Everything’s humming along now though, so I’m going to leave 
>> them. 
>>  
>> I also reduced the node heartbeat interval from 1000ms down to 500ms 
>> ("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms": "500" in cluster 
>> configuration JSON), since I’m told that NodeManager will only allocate 1 
>> container per node per heartbeat when dealing with non-localized data, like 
>> we are since it’s in S3.  I also doubled the memory given to the YARN 
>> Resource Manager from the default for the m3.xlarge node type I’m using 
>> ("YARN_RESOURCEMANAGER_HEAPSIZE": "5120" in cluster configuration JSON).
>>  
>> Thanks again to Sunil and Shubh (and my colleague, York) for the helpful 
>> guidance!
>>  
>> Take care,
>> -Jeff 
>>  
>> From: Shubh hadoopExp [mailto:[email protected] 
>> <mailto:[email protected]>] 
>> Sent: Wednesday, May 25, 2016 11:08 PM
>> To: Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>>
>> Cc: Sunil Govind <[email protected] <mailto:[email protected]>>; 
>> [email protected] <mailto:[email protected]>
>> Subject: Re: YARN cluster underutilization
>>  
>> Hey,
>>  
>> OFFSWITCH allocation means if the data locality is maintained or not. It has 
>> no relation with heartbeat! Heartbeat is just used to clear the pipelining 
>> of Container request.
>>  
>> -Shubh
>>  
>>  
>> On May 25, 2016, at 3:30 PM, Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>> wrote:
>>  
>> Interesting stuff!  I did not know about this handling of OFFSWITCH 
>> requests. 
>>  
>> To get around this, would you recommend reducing the heartbeat interval, 
>> perhaps to 250ms to get a 4x improvement in container allocation rate (or is 
>> it not quite as simple as that)?  Maybe doing this in combination with using 
>> a greater number of smaller nodes would help?  Would overloading the 
>> ResourceManager be a concern if doing that?  Should I bump up the 
>> “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for 
>> m3.xlarge is 2396M), or would you suggest any other knobs to turn to help RM 
>> handle it?
>>  
>> Thanks again for all your help, Sunil!
>>  
>> From: Sunil Govind [mailto:[email protected] 
>> <mailto:[email protected]>] 
>> Sent: Wednesday, May 25, 2016 1:07 PM
>> To: Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>>; [email protected] 
>> <mailto:[email protected]>
>> Subject: Re: YARN cluster underutilization
>>  
>> Hi Jeff,
>>  
>>  I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms 
>> property set to 1000 in the job configuration
>> >> Ok, This make sense.. node heartbeat seems default.
>>  
>> If there are no locality specified in resource requests (using 
>> ResourceRequest.ANY) , then YARN will allocate only one container per node 
>> heartbeat. So your container allocation rate is slower considering 600k 
>> requests and only 20 nodes. And if more number of containers are also 
>> getting released fast (I could see that some containers lifetime is 80 to 90 
>> secs), then this will become more complex and container allocation rate will 
>> be slower.
>>  
>> YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to 
>> make more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But 
>> its not yet available in any release.
>>  
>> I guess you can investigate more in this line to confirm this points. 
>>  
>> Thanks
>> Sunil
>>  
>>  
>> On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Thanks for digging into the log, Sunil, and making some interesting 
>> observations!
>>  
>> The heartbeat interval hasn’t been changed from its default, and I do see 
>> the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to 
>> 1000 in the job configuration.  I was searching in the log for heartbeat 
>> interval information, but I didn’t find anything.  Where do you look in the 
>> log for the heartbeats?
>>  
>> Also, you are correct about there being no data locality, as all the input 
>> data is in S3.  The utilization has been fluctuating, but I can’t really see 
>> a pattern or tell why.  It actually started out pretty low in the 20-30% 
>> range and then managed to get up into the 50-70% range after a while, but 
>> that was short-lived, as it went back down into the 20-30% range for quite a 
>> while.  While writing this, I saw it surprisingly hit 80%!!  First time I’ve 
>> seen it that high in the 20 hours it’s been running…  Although looks like it 
>> may be headed back down.  I’m perplexed.  Wouldn’t you generally expect 
>> fairly stable utilization over the course of the job?  (This is the only job 
>> running.)
>>  
>> Thanks,
>> -Jeff
>>  
>> From: Sunil Govind [mailto:[email protected] 
>> <mailto:[email protected]>] 
>> Sent: Wednesday, May 25, 2016 11:55 AM
>> 
>> To: Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>>; [email protected] 
>> <mailto:[email protected]>
>> Subject: Re: YARN cluster underutilization
>>  
>> Hi Jeff.
>>  
>> Thanks for sharing this information. I have some observations from this logs.
>>  
>> - I think the node heartbeat is around 2/3 seconds here. Is it changed due 
>> to some other reasons?
>> - And all mappers Resource Request seems to be asking for type ANY (there is 
>> no data locality). pls correct me if I am wrong.
>>  
>> If the resource request type is ANY, only one container will be allocated 
>> per heartbeat for a node. Here node heartbeat delay is also more. And I can 
>> see that containers are released very fast too. So when u started you 
>> application, are you seeing more better resource utilization? And once 
>> containers started to get released/completed, you are seeing under 
>> utilization. 
>>  
>> Pls look into this line. It may be a reason.
>>  
>> Thanks
>> Sunil
>>  
>> On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Thanks for your thoughts thus far, Sunil.  Most grateful for any additional 
>> help you or others can offer.  To answer your questions,
>>  
>> 1.       This is a custom M/R job, which uses mappers only (no reduce phase) 
>> to process GPS probe data and filter based on inclusion within a provided 
>> polygon.  There is actually a lot of upfront work done in the driver to make 
>> that task as simple as can be (identifies a list of tiles that are 
>> completely inside the polygon and those that fall across an edge, for which 
>> more processing would be needed), but the job would still be more 
>> compute-intensive than wordcount, for example.
>>  
>> 2.       I’m running almost 84k mappers for this job.  This is actually down 
>> from ~600k mappers, since one other thing I’ve done is increased the 
>> mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the 
>> job.  Data is in S3, so loss of locality isn’t really a concern.
>>  
>> 3.       For NodeManager configuration, I’m using EMR’s default 
>> configuration for the m3.xlarge instance type, which is 
>> yarn.scheduler.minimum-allocation-mb=32, 
>> yarn.scheduler.maximum-allocation-mb=11520, and 
>> yarn.nodemanager.resource.memory-mb=11520.  YARN dashboard shows min/max 
>> allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>.
>>  
>> 4.       Capacity Scheduler [MEMORY]
>>  
>> 5.       I’ve attached 2500 lines from the RM log.  Happy to grab more, but 
>> they are pretty big, and I thought that might be sufficient.
>>  
>> Any guidance is much appreciated!
>> -Jeff
>>  
>> From: Sunil Govind [mailto:[email protected] 
>> <mailto:[email protected]>] 
>> Sent: Wednesday, May 25, 2016 10:55 AM
>> To: Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>>; [email protected] 
>> <mailto:[email protected]>
>> Subject: Re: YARN cluster underutilization
>>  
>> Hi Jeff,
>>  
>> It looks like to you are allocating more memory for AM container. Mostly you 
>> might not need 6Gb (as per the log). Could you please help  to provide some 
>> more information.
>>  
>> 1. What type of mapreduce application (wordcount etc) are you running? Some 
>> AMs may be CPU intensive and some may not be. So based on the type 
>> application, memory/cpu can be tuned for better utilization.
>> 2. How many mappers (reducers) are you trying to run here? 
>> 3. You have mentioned that each node has 8 cores and 15GB, but how much is 
>> actually configured for NM?
>> 4. Which scheduler are you using?
>> 5. Its better to attach RM log if possible.
>>  
>> Thanks
>> Sunil
>>  
>> On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi, all.
>>  
>> I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN cluster 
>> that is being quite underutilized (utilization of around 25-30%).  The EMR 
>> cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores each and 
>> 15G total memory (with 11.25G of that available to YARN).  I’ve configured 
>> mapper memory with the following properties, which should allow for 8 
>> containers running map tasks per node:
>>  
>> <property><name>mapreduce.map.memory.mb</name><value>1440</value></property> 
>>   <!-- Container size -->
>> <property><name>mapreduce.map.java.opts</name><value>-Xmx1024m</value></property>
>>   <!-- JVM arguments for a Map task -->
>>  
>> It was suggested that perhaps my AppMaster was having trouble keeping up 
>> with creating all the mapper containers and that I bulk up its resource 
>> allocation.  So I did, as shown below, providing it 6G container memory (5G 
>> task memory), 3 cores, and 60 task listener threads.
>>  
>> <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>60</value></property>
>>   <!-- App Master task listener threads -->
>> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>3</value></property>
>>   <!-- App Master container vcores -->
>> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>6400</value></property>
>>   <!-- App Master container size -->
>> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx5120m</value></property>
>>   <!-- JVM arguments for each Application Master -->
>>  
>> Taking a look at the node on which the AppMaster is running, I'm seeing 
>> plenty of CPU idle time and free memory, yet there are still nodes with no 
>> utilization (0 running containers).  The log indicates that the AppMaster 
>> has way more memory (physical/virtual) than it appears to need with repeated 
>> log messages like this:
>>  
>> 2016-05-25 13:59:04,615 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>>  (Container Monitor): Memory usage of ProcessTree 11265 for container-id 
>> container_1464122327865_0002_01_000001: 1.6 GB of 6.3 GB physical memory 
>> used; 6.1 GB of 31.3 GB virtual memory used
>>  
>> Can you please help me figure out where to go from here to troubleshoot, or 
>> any other things to try?
>>  
>> Thanks!
>> -Jeff
>

Re: YARN cluster underutilization

Reply via email to