Re: YARN cluster underutilization

Sunil Govind Tue, 21 Jun 2016 23:10:18 -0700

Hi

Input split size is increased to make the containers to run longer and
process more data. Thus slow container allocation wont be a problem(Since
all container requests are coming w/o data locality). Its better to keep
more memory for AM container when it handles 600k+ requests. And each
mappers will directly emit data to disk as mentioned by Jeff by applying
some filters.


Thanks
Sunil

On Tue, Jun 21, 2016 at 12:07 PM Deepak Goel <[email protected]> wrote:

> Pretty nice. However why would swapping to disk happen when there is
> enough physical memory available..
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>    --
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, [email protected]
> [email protected]
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
> On Sat, May 28, 2016 at 1:31 AM, Guttadauro, Jeff <
> [email protected]> wrote:
>
>> Hi, all.
>>
>>
>>
>> Just wanted to provide an update, which is that I’m finally getting good
>> YARN cluster utilization (consistently within the 90-100% range!).  I
>> believe the biggest change was to increase the min split size.  Since our
>> input is all in S3 and data locality is not really an issue, I bumped it up
>> to 2G to minimize the impact of allocation/deallocation of container
>> resources, since each container will be up working for longer, so that now
>> occurs less frequently.
>>
>>
>>
>>   
>> <property><name>mapreduce.input.fileinputformat.split.minsize</name><value>2147483648</value><!--
>> 2G --></property>
>>
>>
>>
>> Not sure how much impact the following changes had, since they were made
>> at the same time.  Everything’s humming along now though, so I’m going to
>> leave them.
>>
>>
>>
>> I also reduced the node heartbeat interval from 1000ms down to 500ms 
>> ("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms":
>> "500" in cluster configuration JSON), since I’m told that NodeManager
>> will only allocate 1 container per node per heartbeat when dealing with
>> non-localized data, like we are since it’s in S3.  I also doubled the
>> memory given to the YARN Resource Manager from the default for the
>> m3.xlarge node type I’m using ("YARN_RESOURCEMANAGER_HEAPSIZE": "5120"
>> in cluster configuration JSON).
>>
>>
>>
>> Thanks again to Sunil and Shubh (and my colleague, York) for the helpful
>> guidance!
>>
>>
>>
>> Take care,
>>
>> -Jeff
>>
>>
>>
>> *From:* Shubh hadoopExp [mailto:[email protected]]
>> *Sent:* Wednesday, May 25, 2016 11:08 PM
>> *To:* Guttadauro, Jeff <[email protected]>
>> *Cc:* Sunil Govind <[email protected]>; [email protected]
>>
>> *Subject:* Re: YARN cluster underutilization
>>
>>
>>
>> Hey,
>>
>>
>>
>> OFFSWITCH allocation means if the data locality is maintained or not. It
>> has no relation with heartbeat! Heartbeat is just used to clear the
>> pipelining of Container request.
>>
>>
>>
>> -Shubh
>>
>>
>>
>>
>>
>> On May 25, 2016, at 3:30 PM, Guttadauro, Jeff <[email protected]>
>> wrote:
>>
>>
>>
>> Interesting stuff!  I did not know about this handling of OFFSWITCH
>> requests.
>>
>>
>>
>> To get around this, would you recommend reducing the heartbeat interval,
>> perhaps to 250ms to get a 4x improvement in container allocation rate (or
>> is it not quite as simple as that)?  Maybe doing this in combination with
>> using a greater number of smaller nodes would help?  Would overloading the
>> ResourceManager be a concern if doing that?  Should I bump up the
>> “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for
>> m3.xlarge is 2396M), or would you suggest any other knobs to turn to help
>> RM handle it?
>>
>>
>>
>> Thanks again for all your help, Sunil!
>>
>>
>>
>> *From:* Sunil Govind [mailto:[email protected]
>> <[email protected]>]
>> *Sent:* Wednesday, May 25, 2016 1:07 PM
>> *To:* Guttadauro, Jeff <[email protected]>; [email protected]
>> *Subject:* Re: YARN cluster underutilization
>>
>>
>>
>> Hi Jeff,
>>
>>
>>
>>  I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
>> property set to 1000 in the job configuration
>>
>> >> Ok, This make sense.. node heartbeat seems default.
>>
>>
>>
>> If there are no locality specified in resource requests (using
>> ResourceRequest.ANY) , then YARN will allocate only one container per node
>> heartbeat. So your container allocation rate is slower considering 600k
>> requests and only 20 nodes. And if more number of containers are also
>> getting released fast (I could see that some containers lifetime is 80 to
>> 90 secs), then this will become more complex and container allocation rate
>> will be slower.
>>
>>
>>
>> YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to
>> make more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But
>> its not yet available in any release.
>>
>>
>>
>> I guess you can investigate more in this line to confirm this points.
>>
>>
>>
>> Thanks
>>
>> Sunil
>>
>>
>>
>>
>>
>> On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff <
>> [email protected]> wrote:
>>
>> Thanks for digging into the log, Sunil, and making some interesting
>> observations!
>>
>>
>>
>> The heartbeat interval hasn’t been changed from its default, and I do see
>> the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to
>> 1000 in the job configuration.  I was searching in the log for heartbeat
>> interval information, but I didn’t find anything.  Where do you look in the
>> log for the heartbeats?
>>
>>
>>
>> Also, you are correct about there being no data locality, as all the
>> input data is in S3.  The utilization has been fluctuating, but I can’t
>> really see a pattern or tell why.  It actually started out pretty low in
>> the 20-30% range and then managed to get up into the 50-70% range after a
>> while, but that was short-lived, as it went back down into the 20-30% range
>> for quite a while.  While writing this, I saw it surprisingly hit 80%!!
>> First time I’ve seen it that high in the 20 hours it’s been running…
>>  Although looks like it may be headed back down.  I’m perplexed.  Wouldn’t
>> you generally expect fairly stable utilization over the course of the job?
>> (This is the only job running.)
>>
>>
>>
>> Thanks,
>>
>> -Jeff
>>
>>
>>
>> *From:* Sunil Govind [mailto:[email protected]]
>> *Sent:* Wednesday, May 25, 2016 11:55 AM
>>
>>
>> *To:* Guttadauro, Jeff <[email protected]>; [email protected]
>> *Subject:* Re: YARN cluster underutilization
>>
>>
>>
>> Hi Jeff.
>>
>>
>>
>> Thanks for sharing this information. I have some observations from this
>> logs.
>>
>>
>>
>> - I think the node heartbeat is around 2/3 seconds here. Is it changed
>> due to some other reasons?
>>
>> - And all mappers Resource Request seems to be asking for type ANY (there
>> is no data locality). pls correct me if I am wrong.
>>
>>
>>
>> If the resource request type is ANY, only one container will be allocated
>> per heartbeat for a node. Here node heartbeat delay is also more. And I can
>> see that containers are released very fast too. So when u started you
>> application, are you seeing more better resource utilization? And once
>> containers started to get released/completed, you are seeing under
>> utilization.
>>
>>
>>
>> Pls look into this line. It may be a reason.
>>
>>
>>
>> Thanks
>>
>> Sunil
>>
>>
>>
>> On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <
>> [email protected]> wrote:
>>
>> Thanks for your thoughts thus far, Sunil.  Most grateful for any
>> additional help you or others can offer.  To answer your questions,
>>
>>
>>
>> 1.       This is a custom M/R job, which uses mappers only (no reduce
>> phase) to process GPS probe data and filter based on inclusion within a
>> provided polygon.  There is actually a lot of upfront work done in the
>> driver to make that task as simple as can be (identifies a list of tiles
>> that are completely inside the polygon and those that fall across an edge,
>> for which more processing would be needed), but the job would still be more
>> compute-intensive than wordcount, for example.
>>
>>
>>
>> 2.       I’m running almost 84k mappers for this job.  This is actually
>> down from ~600k mappers, since one other thing I’ve done is increased the
>> mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the
>> job.  Data is in S3, so loss of locality isn’t really a concern.
>>
>>
>>
>> 3.       For NodeManager configuration, I’m using EMR’s default
>> configuration for the m3.xlarge instance type, which is
>> yarn.scheduler.minimum-allocation-mb=32,
>> yarn.scheduler.maximum-allocation-mb=11520, and
>> yarn.nodemanager.resource.memory-mb=11520.  YARN dashboard shows min/max
>> allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>.
>>
>>
>>
>> 4.       Capacity Scheduler [MEMORY]
>>
>>
>>
>> 5.       I’ve attached 2500 lines from the RM log.  Happy to grab more,
>> but they are pretty big, and I thought that might be sufficient.
>>
>>
>>
>> Any guidance is much appreciated!
>>
>> -Jeff
>>
>>
>>
>> *From:* Sunil Govind [mailto:[email protected]]
>> *Sent:* Wednesday, May 25, 2016 10:55 AM
>> *To:* Guttadauro, Jeff <[email protected]>; [email protected]
>> *Subject:* Re: YARN cluster underutilization
>>
>>
>>
>> Hi Jeff,
>>
>>
>>
>> It looks like to you are allocating more memory for AM container. Mostly
>> you might not need 6Gb (as per the log). Could you please help  to provide
>> some more information.
>>
>>
>>
>> 1. What type of mapreduce application (wordcount etc) are you running?
>> Some AMs may be CPU intensive and some may not be. So based on the type
>> application, memory/cpu can be tuned for better utilization.
>>
>> 2. How many mappers (reducers) are you trying to run here?
>>
>> 3. You have mentioned that each node has 8 cores and 15GB, but how much
>> is actually configured for NM?
>>
>> 4. Which scheduler are you using?
>>
>> 5. Its better to attach RM log if possible.
>>
>>
>>
>> Thanks
>>
>> Sunil
>>
>>
>>
>> On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff <
>> [email protected]> wrote:
>>
>> Hi, all.
>>
>>
>>
>> I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN
>> cluster that is being quite underutilized (utilization of around 25-30%).
>> The EMR cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores
>> each and 15G total memory (with 11.25G of that available to YARN).  I’ve
>> configured mapper memory with the following properties, which should allow
>> for 8 containers running map tasks per node:
>>
>>
>>
>> <property><name>mapreduce.map.memory.mb</name><value>1440</value></property>
>> <!-- Container size -->
>>
>> <property><name>mapreduce.map.java.opts</name><value>-Xmx1024m</value></property>
>> <!-- JVM arguments for a Map task -->
>>
>>
>>
>> It was suggested that perhaps my AppMaster was having trouble keeping up
>> with creating all the mapper containers and that I bulk up its resource
>> allocation.  So I did, as shown below, providing it 6G container memory (5G
>> task memory), 3 cores, and 60 task listener threads.
>>
>>
>>
>> <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>60</value></property>
>> <!-- App Master task listener threads -->
>>
>> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>3</value></property>
>> <!-- App Master container vcores -->
>>
>> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>6400</value></property>
>> <!-- App Master container size -->
>>
>> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx5120m</value></property>
>> <!-- JVM arguments for each Application Master -->
>>
>>
>>
>> Taking a look at the node on which the AppMaster is running, I'm seeing
>> plenty of CPU idle time and free memory, yet there are still nodes with no
>> utilization (0 running containers).  The log indicates that the AppMaster
>> has way more memory (physical/virtual) than it appears to need with
>> repeated log messages like this:
>>
>>
>>
>> 2016-05-25 13:59:04,615 INFO
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>> (Container Monitor): Memory usage of ProcessTree 11265 for container-id
>> container_1464122327865_0002_01_000001: 1.6 GB of 6.3 GB physical memory
>> used; 6.1 GB of 31.3 GB virtual memory used
>>
>>
>>
>> Can you please help me figure out where to go from here to troubleshoot,
>> or any other things to try?
>>
>>
>>
>> Thanks!
>>
>> -Jeff
>>
>>
>>
>
>

Re: YARN cluster underutilization

Reply via email to