[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035604#comment-14035604 ] Niels Basjes commented on MAPREDUCE-5928: - I changed some of the memory settings and now the job completes successfully. This was with the mapreduce.job.reduce.slowstart.completedmaps at it's default value of 0.05 (5%) Apparently without the blacklisted node it works fine and the fractional memory shenanigans doesn't impact the job. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035694#comment-14035694 ] Jason Lowe commented on MAPREDUCE-5928: --- Shall we mark this a duplicate of YARN-1680 then? Sounds like if the memory of the blacklisted node had been removed from the reported headroom the AM would have acted appropriately. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035819#comment-14035819 ] Niels Basjes commented on MAPREDUCE-5928: - Reading through the description of YARN-1680 sure seems like the root cause of my problem. So yes, go ahead and mark this one as a duplicate. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033614#comment-14033614 ] Niels Basjes commented on MAPREDUCE-5928: - I have trouble finding the spot where this 500MB per container is defined. Can you give me some input where this is specified? Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033798#comment-14033798 ] Jason Lowe commented on MAPREDUCE-5928: --- The container size for maps, reduces, and the MR ApplicationMaster are specified in mapreduce.map.memory.mb, mapreduce.reduce.memory.mb, and yarn.app.mapreduce.am.resource.mb, respectively. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032473#comment-14032473 ] Jason Lowe commented on MAPREDUCE-5928: --- This sounds like a bug in either headroom calculation or in RMContainerAllocator where the AM decides whether to preempt reducers. Could you look in the AM log and see what it saw for the headroom and whether it made any attempt at all to ramp down reducers? Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032476#comment-14032476 ] Niels Basjes commented on MAPREDUCE-5928: - I'm not the only one who ran into this: http://hortonworks.com/community/forums/topic/mapreduce-race-condition-big-job/ Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032496#comment-14032496 ] Jonathan Eagles commented on MAPREDUCE-5928: I think this is a case of task preemption not working since the headroom calculation is not correct. Can you verify you are using the capacity scheduler. YARN-1198 Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032503#comment-14032503 ] Jason Lowe commented on MAPREDUCE-5928: --- I'm wondering if the fact that the nodemanger memory has a fractional remainder when it's full triggers the issue. With tasks all being 512MB that means each node will have 152MB remaining. I'm guessing that with enough nodes those remainders will add up to appear to be enough space to run another task but in reality that task cannot be scheduled since the memory being reported is fragmented. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032516#comment-14032516 ] Niels Basjes commented on MAPREDUCE-5928: - I have not actively configured any scheduling. So I guess it is running the 'default' setting ? Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032545#comment-14032545 ] Jason Lowe commented on MAPREDUCE-5928: --- I'm pretty sure you're using the CapacityScheduler since that's been the default in Apache Hadoop for some time now. I'm not positive about the HDP release, but I suspect it, too, is configured to use the CapacityScheduler by default. After a quick examination of the AM log, it looks like a couple of things are going on. The AM is blacklisting one of the nodes, and we can see that node not being used in the cluster picture. There's a known issue with headroom calculation not taking into account blacklisted nodes. See YARN-1680. The node ends up being blacklisted because the NM shot a number of the tasks for being over container limits. It looks like the containers are being allocated as using 500MB but the JVM heap sizes are set to 512MB. Note that the container size includes the size of the entire process tree for the task. That's not just the heap, so it needs to also include thread stacks, JVM data, JVM code, any subprocesses launched (e.g.: hadoop streaming) etc. If you really need a 512MB heap then I'd allocate 768MB or maybe even 1024MB containers, depending on what the tasks are doing. It does look like some fractional memory shenanigans could be involved here, as the picture shows most of the nodes having only 200MB free. It'd be interesting to know if you still hit the deadlock after fixing the cause of the blacklisting. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032564#comment-14032564 ] Niels Basjes commented on MAPREDUCE-5928: - I took the 'dead' node (node2) offline (completely stopped all hadoop/yarn related deamons) and ran the same job again after it had disappeared in all overviews. Now it does complete all mappers. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032766#comment-14032766 ] Niels Basjes commented on MAPREDUCE-5928: - Where/how can I determine for sure if the capacity scheduler is used? Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032802#comment-14032802 ] Jason Lowe commented on MAPREDUCE-5928: --- You can click on the Tools-Configuration link on the UI and verify yarn.resourcemanager.scheduler.class is org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler or look for capacityscheduler in the RM logs. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032831#comment-14032831 ] Niels Basjes commented on MAPREDUCE-5928: - Confirmed It is using the CapacityScheduler: {code} property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler/value sourceyarn-default.xml/source /property {code} I'm going to fiddle with the memory setting tomorrow. Deadlock allocating containers for mappers and reducers --- Key: MAPREDUCE-5928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2) Reporter: Niels Basjes Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully loaded.png.jpg, MR job stuck in deadlock.png.jpg I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers). Due to the small memory of these systems I configured yarn as follows: {quote} yarn.nodemanager.resource.memory-mb = 2200 yarn.scheduler.minimum-allocation-mb = 250 {quote} On my client I did {quote} mapreduce.map.memory.mb = 512 mapreduce.reduce.memory.mb = 512 {quote} Now I run a job with 27 mappers and 32 reducers. After a while I saw this deadlock occur: - All nodes had been filled to their maximum capacity with reducers. - 1 Mapper was waiting for a container slot to start in. I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container). *Workaround*: I set this value from my job. The default value is 0.05 (= 5%) {quote} mapreduce.job.reduce.slowstart.completedmaps = 0.99f {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)