[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-18 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035604#comment-14035604
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

I changed some of the memory settings and now the job completes successfully.
This was with the mapreduce.job.reduce.slowstart.completedmaps at it's default 
value of 0.05 (5%)
Apparently without the blacklisted node it works fine and the fractional memory 
shenanigans doesn't impact the job.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035694#comment-14035694
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---

Shall we mark this a duplicate of YARN-1680 then?  Sounds like if the memory of 
the blacklisted node had been removed from the reported headroom the AM would 
have acted appropriately.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-18 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035819#comment-14035819
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

Reading through the description of YARN-1680 sure seems like the root cause of 
my problem.
So yes, go ahead and mark this one as a duplicate.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-17 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033614#comment-14033614
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

I have trouble finding the spot where this 500MB per container is defined.
Can you give me some input where this is specified?


 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033798#comment-14033798
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---

The container size for maps, reduces, and the MR ApplicationMaster are 
specified in mapreduce.map.memory.mb, mapreduce.reduce.memory.mb, and 
yarn.app.mapreduce.am.resource.mb, respectively.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032473#comment-14032473
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---

This sounds like a bug in either headroom calculation or in 
RMContainerAllocator where the AM decides whether to preempt reducers.  Could 
you look in the AM log and see what it saw for the headroom and whether it made 
any attempt at all to ramp down reducers?

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: Cluster fully loaded.png.jpg, MR job stuck in 
 deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032476#comment-14032476
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

I'm not the only one who ran into this: 
http://hortonworks.com/community/forums/topic/mapreduce-race-condition-big-job/

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: Cluster fully loaded.png.jpg, MR job stuck in 
 deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032496#comment-14032496
 ] 

Jonathan Eagles commented on MAPREDUCE-5928:


I think this is a case of task preemption not working since the headroom 
calculation is not correct. Can you verify you are using the capacity scheduler.

YARN-1198

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: Cluster fully loaded.png.jpg, MR job stuck in 
 deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032503#comment-14032503
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---

I'm wondering if the fact that the nodemanger memory has a fractional remainder 
when it's full triggers the issue.  With tasks all being 512MB that means 
each node will have 152MB remaining.  I'm guessing that with enough nodes those 
remainders will add up to appear to be enough space to run another task but in 
reality that task cannot be scheduled since the memory being reported is 
fragmented.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: Cluster fully loaded.png.jpg, MR job stuck in 
 deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032516#comment-14032516
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

I have not actively configured any scheduling.
So I guess it is running the 'default' setting ?


 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032545#comment-14032545
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---

I'm pretty sure you're using the CapacityScheduler since that's been the 
default in Apache Hadoop for some time now.  I'm not positive about the HDP 
release, but I suspect it, too, is configured to use the CapacityScheduler by 
default.

After a quick examination of the AM log, it looks like a couple of things are 
going on.  The AM is blacklisting one of the nodes, and we can see that node 
not being used in the cluster picture.  There's a known issue with headroom 
calculation not taking into account blacklisted nodes.  See YARN-1680. 

The node ends up being blacklisted because the NM shot a number of the tasks 
for being over container limits.  It looks like the containers are being 
allocated as using 500MB but the JVM heap sizes are set to 512MB.  Note that 
the container size includes the size of the entire process tree for the task.  
That's not just the heap, so it needs to also include thread stacks, JVM data, 
JVM code, any subprocesses launched (e.g.: hadoop streaming) etc.  If you 
really need a 512MB heap then I'd allocate 768MB or maybe even 1024MB 
containers, depending on what the tasks are doing.

It does look like some fractional memory shenanigans could be involved here, as 
the picture shows most of the nodes having only 200MB free.  It'd be 
interesting to know if you still hit the deadlock after fixing the cause of the 
blacklisting.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032564#comment-14032564
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

I took the 'dead' node (node2) offline (completely stopped all hadoop/yarn 
related deamons) and ran the same job again after it had disappeared in all 
overviews.
Now it does complete all mappers.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032766#comment-14032766
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

Where/how can I determine for sure if the capacity scheduler is used?

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032802#comment-14032802
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---

You can click on the Tools-Configuration link on the UI and verify 
yarn.resourcemanager.scheduler.class is 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 or look for capacityscheduler in the RM logs.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5928) Deadlock allocating containers for mappers and reducers

2014-06-16 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032831#comment-14032831
 ] 

Niels Basjes commented on MAPREDUCE-5928:
-

Confirmed It is using the CapacityScheduler:
{code}
property
nameyarn.resourcemanager.scheduler.class/name

valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler/value
sourceyarn-default.xml/source
/property
{code}

I'm going to fiddle with the memory setting tomorrow.

 Deadlock allocating containers for mappers and reducers
 ---

 Key: MAPREDUCE-5928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
Reporter: Niels Basjes
 Attachments: AM-MR-syslog - Cleaned.txt.gz, Cluster fully 
 loaded.png.jpg, MR job stuck in deadlock.png.jpg


 I have a small cluster consisting of 8 desktop class systems (1 master + 7 
 workers).
 Due to the small memory of these systems I configured yarn as follows:
 {quote}
 yarn.nodemanager.resource.memory-mb = 2200
 yarn.scheduler.minimum-allocation-mb = 250
 {quote}
 On my client I did
 {quote}
 mapreduce.map.memory.mb = 512
 mapreduce.reduce.memory.mb = 512
 {quote}
 Now I run a job with 27 mappers and 32 reducers.
 After a while I saw this deadlock occur:
 - All nodes had been filled to their maximum capacity with reducers.
 - 1 Mapper was waiting for a container slot to start in.
 I tried killing reducer attempts but that didn't help (new reducer attempts 
 simply took the existing container).
 *Workaround*:
 I set this value from my job. The default value is 0.05 (= 5%)
 {quote}
 mapreduce.job.reduce.slowstart.completedmaps = 0.99f
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)