Hi Jeff. Thanks for sharing this information. I have some observations from this logs.
- I think the node heartbeat is around 2/3 seconds here. Is it changed due to some other reasons? - And all mappers Resource Request seems to be asking for type ANY (there is no data locality). pls correct me if I am wrong. If the resource request type is ANY, only one container will be allocated per heartbeat for a node. Here node heartbeat delay is also more. And I can see that containers are released very fast too. So when u started you application, are you seeing more better resource utilization? And once containers started to get released/completed, you are seeing under utilization. Pls look into this line. It may be a reason. Thanks Sunil On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff <[email protected]> wrote: > Thanks for your thoughts thus far, Sunil. Most grateful for any > additional help you or others can offer. To answer your questions, > > > > 1. This is a custom M/R job, which uses mappers only (no reduce > phase) to process GPS probe data and filter based on inclusion within a > provided polygon. There is actually a lot of upfront work done in the > driver to make that task as simple as can be (identifies a list of tiles > that are completely inside the polygon and those that fall across an edge, > for which more processing would be needed), but the job would still be more > compute-intensive than wordcount, for example. > > > > 2. I’m running almost 84k mappers for this job. This is actually > down from ~600k mappers, since one other thing I’ve done is increased the > mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the > job. Data is in S3, so loss of locality isn’t really a concern. > > > > 3. For NodeManager configuration, I’m using EMR’s default > configuration for the m3.xlarge instance type, which is > yarn.scheduler.minimum-allocation-mb=32, > yarn.scheduler.maximum-allocation-mb=11520, and > yarn.nodemanager.resource.memory-mb=11520. YARN dashboard shows min/max > allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>. > > > > 4. Capacity Scheduler [MEMORY] > > > > 5. I’ve attached 2500 lines from the RM log. Happy to grab more, > but they are pretty big, and I thought that might be sufficient. > > > > Any guidance is much appreciated! > > -Jeff > > > > *From:* Sunil Govind [mailto:[email protected]] > *Sent:* Wednesday, May 25, 2016 10:55 AM > *To:* Guttadauro, Jeff <[email protected]>; [email protected] > *Subject:* Re: YARN cluster underutilization > > > > Hi Jeff, > > > > It looks like to you are allocating more memory for AM container. Mostly > you might not need 6Gb (as per the log). Could you please help to provide > some more information. > > > > 1. What type of mapreduce application (wordcount etc) are you running? > Some AMs may be CPU intensive and some may not be. So based on the type > application, memory/cpu can be tuned for better utilization. > > 2. How many mappers (reducers) are you trying to run here? > > 3. You have mentioned that each node has 8 cores and 15GB, but how much is > actually configured for NM? > > 4. Which scheduler are you using? > > 5. Its better to attach RM log if possible. > > > > Thanks > > Sunil > > > > On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff <[email protected]> > wrote: > > Hi, all. > > > > I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN > cluster that is being quite underutilized (utilization of around 25-30%). > The EMR cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores > each and 15G total memory (with 11.25G of that available to YARN). I’ve > configured mapper memory with the following properties, which should allow > for 8 containers running map tasks per node: > > > > <property><name>mapreduce.map.memory.mb</name><value>1440</value></property> > <!-- Container size --> > > <property><name>mapreduce.map.java.opts</name><value>-Xmx1024m</value></property> > <!-- JVM arguments for a Map task --> > > > > It was suggested that perhaps my AppMaster was having trouble keeping up > with creating all the mapper containers and that I bulk up its resource > allocation. So I did, as shown below, providing it 6G container memory (5G > task memory), 3 cores, and 60 task listener threads. > > > > <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>60</value></property> > <!-- App Master task listener threads --> > > <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>3</value></property> > <!-- App Master container vcores --> > > <property><name>yarn.app.mapreduce.am.resource.mb</name><value>6400</value></property> > <!-- App Master container size --> > > <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx5120m</value></property> > <!-- JVM arguments for each Application Master --> > > > > Taking a look at the node on which the AppMaster is running, I'm seeing > plenty of CPU idle time and free memory, yet there are still nodes with no > utilization (0 running containers). The log indicates that the AppMaster > has way more memory (physical/virtual) than it appears to need with > repeated log messages like this: > > > > 2016-05-25 13:59:04,615 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > (Container Monitor): Memory usage of ProcessTree 11265 for container-id > container_1464122327865_0002_01_000001: 1.6 GB of 6.3 GB physical memory > used; 6.1 GB of 31.3 GB virtual memory used > > > > Can you please help me figure out where to go from here to troubleshoot, > or any other things to try? > > > > Thanks! > > -Jeff > > > >
