Hi Input split size is increased to make the containers to run longer and process more data. Thus slow container allocation wont be a problem(Since all container requests are coming w/o data locality). Its better to keep more memory for AM container when it handles 600k+ requests. And each mappers will directly emit data to disk as mentioned by Jeff by applying some filters.
Thanks Sunil On Tue, Jun 21, 2016 at 12:07 PM Deepak Goel <[email protected]> wrote: > Pretty nice. However why would swapping to disk happen when there is > enough physical memory available.. > > Hey > > Namaskara~Nalama~Guten Tag~Bonjour > > > -- > Keigu > > Deepak > 73500 12833 > www.simtree.net, [email protected] > [email protected] > > LinkedIn: www.linkedin.com/in/deicool > Skype: thumsupdeicool > Google talk: deicool > Blog: http://loveandfearless.wordpress.com > Facebook: http://www.facebook.com/deicool > > "Contribute to the world, environment and more : > http://www.gridrepublic.org > " > > On Sat, May 28, 2016 at 1:31 AM, Guttadauro, Jeff < > [email protected]> wrote: > >> Hi, all. >> >> >> >> Just wanted to provide an update, which is that I’m finally getting good >> YARN cluster utilization (consistently within the 90-100% range!). I >> believe the biggest change was to increase the min split size. Since our >> input is all in S3 and data locality is not really an issue, I bumped it up >> to 2G to minimize the impact of allocation/deallocation of container >> resources, since each container will be up working for longer, so that now >> occurs less frequently. >> >> >> >> >> <property><name>mapreduce.input.fileinputformat.split.minsize</name><value>2147483648</value><!-- >> 2G --></property> >> >> >> >> Not sure how much impact the following changes had, since they were made >> at the same time. Everything’s humming along now though, so I’m going to >> leave them. >> >> >> >> I also reduced the node heartbeat interval from 1000ms down to 500ms >> ("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms": >> "500" in cluster configuration JSON), since I’m told that NodeManager >> will only allocate 1 container per node per heartbeat when dealing with >> non-localized data, like we are since it’s in S3. I also doubled the >> memory given to the YARN Resource Manager from the default for the >> m3.xlarge node type I’m using ("YARN_RESOURCEMANAGER_HEAPSIZE": "5120" >> in cluster configuration JSON). >> >> >> >> Thanks again to Sunil and Shubh (and my colleague, York) for the helpful >> guidance! >> >> >> >> Take care, >> >> -Jeff >> >> >> >> *From:* Shubh hadoopExp [mailto:[email protected]] >> *Sent:* Wednesday, May 25, 2016 11:08 PM >> *To:* Guttadauro, Jeff <[email protected]> >> *Cc:* Sunil Govind <[email protected]>; [email protected] >> >> *Subject:* Re: YARN cluster underutilization >> >> >> >> Hey, >> >> >> >> OFFSWITCH allocation means if the data locality is maintained or not. It >> has no relation with heartbeat! Heartbeat is just used to clear the >> pipelining of Container request. >> >> >> >> -Shubh >> >> >> >> >> >> On May 25, 2016, at 3:30 PM, Guttadauro, Jeff <[email protected]> >> wrote: >> >> >> >> Interesting stuff! I did not know about this handling of OFFSWITCH >> requests. >> >> >> >> To get around this, would you recommend reducing the heartbeat interval, >> perhaps to 250ms to get a 4x improvement in container allocation rate (or >> is it not quite as simple as that)? Maybe doing this in combination with >> using a greater number of smaller nodes would help? Would overloading the >> ResourceManager be a concern if doing that? Should I bump up the >> “YARN_RESOURCEMANAGER_HEAPSIZE” configuration property (current default for >> m3.xlarge is 2396M), or would you suggest any other knobs to turn to help >> RM handle it? >> >> >> >> Thanks again for all your help, Sunil! >> >> >> >> *From:* Sunil Govind [mailto:[email protected] >> <[email protected]>] >> *Sent:* Wednesday, May 25, 2016 1:07 PM >> *To:* Guttadauro, Jeff <[email protected]>; [email protected] >> *Subject:* Re: YARN cluster underutilization >> >> >> >> Hi Jeff, >> >> >> >> I do see the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms >> property set to 1000 in the job configuration >> >> >> Ok, This make sense.. node heartbeat seems default. >> >> >> >> If there are no locality specified in resource requests (using >> ResourceRequest.ANY) , then YARN will allocate only one container per node >> heartbeat. So your container allocation rate is slower considering 600k >> requests and only 20 nodes. And if more number of containers are also >> getting released fast (I could see that some containers lifetime is 80 to >> 90 secs), then this will become more complex and container allocation rate >> will be slower. >> >> >> >> YARN-4963 <https://issues.apache.org/jira/browse/YARN-4963> is trying to >> make more allocation per heartbeat for NODE_OFFSWITCH (ANY) requests. But >> its not yet available in any release. >> >> >> >> I guess you can investigate more in this line to confirm this points. >> >> >> >> Thanks >> >> Sunil >> >> >> >> >> >> On Wed, May 25, 2016 at 11:00 PM Guttadauro, Jeff < >> [email protected]> wrote: >> >> Thanks for digging into the log, Sunil, and making some interesting >> observations! >> >> >> >> The heartbeat interval hasn’t been changed from its default, and I do see >> the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms property set to >> 1000 in the job configuration. I was searching in the log for heartbeat >> interval information, but I didn’t find anything. Where do you look in the >> log for the heartbeats? >> >> >> >> Also, you are correct about there being no data locality, as all the >> input data is in S3. The utilization has been fluctuating, but I can’t >> really see a pattern or tell why. It actually started out pretty low in >> the 20-30% range and then managed to get up into the 50-70% range after a >> while, but that was short-lived, as it went back down into the 20-30% range >> for quite a while. While writing this, I saw it surprisingly hit 80%!! >> First time I’ve seen it that high in the 20 hours it’s been running… >> Although looks like it may be headed back down. I’m perplexed. Wouldn’t >> you generally expect fairly stable utilization over the course of the job? >> (This is the only job running.) >> >> >> >> Thanks, >> >> -Jeff >> >> >> >> *From:* Sunil Govind [mailto:[email protected]] >> *Sent:* Wednesday, May 25, 2016 11:55 AM >> >> >> *To:* Guttadauro, Jeff <[email protected]>; [email protected] >> *Subject:* Re: YARN cluster underutilization >> >> >> >> Hi Jeff. >> >> >> >> Thanks for sharing this information. I have some observations from this >> logs. >> >> >> >> - I think the node heartbeat is around 2/3 seconds here. Is it changed >> due to some other reasons? >> >> - And all mappers Resource Request seems to be asking for type ANY (there >> is no data locality). pls correct me if I am wrong. >> >> >> >> If the resource request type is ANY, only one container will be allocated >> per heartbeat for a node. Here node heartbeat delay is also more. And I can >> see that containers are released very fast too. So when u started you >> application, are you seeing more better resource utilization? And once >> containers started to get released/completed, you are seeing under >> utilization. >> >> >> >> Pls look into this line. It may be a reason. >> >> >> >> Thanks >> >> Sunil >> >> >> >> On Wed, May 25, 2016 at 9:59 PM Guttadauro, Jeff < >> [email protected]> wrote: >> >> Thanks for your thoughts thus far, Sunil. Most grateful for any >> additional help you or others can offer. To answer your questions, >> >> >> >> 1. This is a custom M/R job, which uses mappers only (no reduce >> phase) to process GPS probe data and filter based on inclusion within a >> provided polygon. There is actually a lot of upfront work done in the >> driver to make that task as simple as can be (identifies a list of tiles >> that are completely inside the polygon and those that fall across an edge, >> for which more processing would be needed), but the job would still be more >> compute-intensive than wordcount, for example. >> >> >> >> 2. I’m running almost 84k mappers for this job. This is actually >> down from ~600k mappers, since one other thing I’ve done is increased the >> mapreduce.input.fileinputformat.split.minsize to 536870912 (512M) for the >> job. Data is in S3, so loss of locality isn’t really a concern. >> >> >> >> 3. For NodeManager configuration, I’m using EMR’s default >> configuration for the m3.xlarge instance type, which is >> yarn.scheduler.minimum-allocation-mb=32, >> yarn.scheduler.maximum-allocation-mb=11520, and >> yarn.nodemanager.resource.memory-mb=11520. YARN dashboard shows min/max >> allocations of <memory:32, vCores:1>/<memory:11520, vCores:8>. >> >> >> >> 4. Capacity Scheduler [MEMORY] >> >> >> >> 5. I’ve attached 2500 lines from the RM log. Happy to grab more, >> but they are pretty big, and I thought that might be sufficient. >> >> >> >> Any guidance is much appreciated! >> >> -Jeff >> >> >> >> *From:* Sunil Govind [mailto:[email protected]] >> *Sent:* Wednesday, May 25, 2016 10:55 AM >> *To:* Guttadauro, Jeff <[email protected]>; [email protected] >> *Subject:* Re: YARN cluster underutilization >> >> >> >> Hi Jeff, >> >> >> >> It looks like to you are allocating more memory for AM container. Mostly >> you might not need 6Gb (as per the log). Could you please help to provide >> some more information. >> >> >> >> 1. What type of mapreduce application (wordcount etc) are you running? >> Some AMs may be CPU intensive and some may not be. So based on the type >> application, memory/cpu can be tuned for better utilization. >> >> 2. How many mappers (reducers) are you trying to run here? >> >> 3. You have mentioned that each node has 8 cores and 15GB, but how much >> is actually configured for NM? >> >> 4. Which scheduler are you using? >> >> 5. Its better to attach RM log if possible. >> >> >> >> Thanks >> >> Sunil >> >> >> >> On Wed, May 25, 2016 at 8:58 PM Guttadauro, Jeff < >> [email protected]> wrote: >> >> Hi, all. >> >> >> >> I have an M/R (map-only) job that I’m running on a Hadoop 2.7.1 YARN >> cluster that is being quite underutilized (utilization of around 25-30%). >> The EMR cluster is 1 master + 20 core m3.xlarge nodes, which have 8 cores >> each and 15G total memory (with 11.25G of that available to YARN). I’ve >> configured mapper memory with the following properties, which should allow >> for 8 containers running map tasks per node: >> >> >> >> <property><name>mapreduce.map.memory.mb</name><value>1440</value></property> >> <!-- Container size --> >> >> <property><name>mapreduce.map.java.opts</name><value>-Xmx1024m</value></property> >> <!-- JVM arguments for a Map task --> >> >> >> >> It was suggested that perhaps my AppMaster was having trouble keeping up >> with creating all the mapper containers and that I bulk up its resource >> allocation. So I did, as shown below, providing it 6G container memory (5G >> task memory), 3 cores, and 60 task listener threads. >> >> >> >> <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>60</value></property> >> <!-- App Master task listener threads --> >> >> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>3</value></property> >> <!-- App Master container vcores --> >> >> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>6400</value></property> >> <!-- App Master container size --> >> >> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx5120m</value></property> >> <!-- JVM arguments for each Application Master --> >> >> >> >> Taking a look at the node on which the AppMaster is running, I'm seeing >> plenty of CPU idle time and free memory, yet there are still nodes with no >> utilization (0 running containers). The log indicates that the AppMaster >> has way more memory (physical/virtual) than it appears to need with >> repeated log messages like this: >> >> >> >> 2016-05-25 13:59:04,615 INFO >> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl >> (Container Monitor): Memory usage of ProcessTree 11265 for container-id >> container_1464122327865_0002_01_000001: 1.6 GB of 6.3 GB physical memory >> used; 6.1 GB of 31.3 GB virtual memory used >> >> >> >> Can you please help me figure out where to go from here to troubleshoot, >> or any other things to try? >> >> >> >> Thanks! >> >> -Jeff >> >> >> > >
