[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-04-01 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Attachment: queue_with_max333cores.png
queue_with_max263cores.png
queue_with_max163cores.png

queue_with_max163cores.png : submit a job to a queue with max 163 cores
queue_with_max263cores.png : submit a job to a queue with max 263 cores
queue_with_max333cores.png : submit a job to a queue with max 333 cores

 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong
Priority: Critical
 Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
 queue_with_max163cores.png, queue_with_max263cores.png, 
 queue_with_max333cores.png


 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-04-01 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Attachment: AM_log_head10.txt.gz
AM_log_tail10.txt.gz

head 10 lines and tail 10 lines of AM log of a deadlock job.

 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong
Priority: Critical
 Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz


 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-03-31 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3416:
---
Priority: Critical  (was: Major)

 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong
Priority: Critical

 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-03-29 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Description: 
I submit a  big job, which has 500 maps and 350 reduce, to a 
queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 
100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, 
a map fails and retry, waiting for a core, while the 300 reduces are waiting 
for failed map to finish. So a deadlock occur. As a result, the job is blocked, 
and the later job in the queue cannot run because no available cores in the 
queue.
I think there is the similar issue for memory of a queue .


  was:
I submit a  big job, which has 500 maps and 350 reduce, to a 
queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 
100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, 
a map fails and retry, waiting for a core, while the 300 reduces are waiting 
for failed map to finish. So a deadlock occur. As a result, the job is blocked, 
and the later job in the queue cannot run because no available cores in the 
queue.



 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong

 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.
 I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-03-29 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Description: 
I submit a  big job, which has 500 maps and 350 reduce, to a 
queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 
100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, 
a map fails and retry, waiting for a core, while the 300 reduces are waiting 
for failed map to finish. So a deadlock accur, the job is blocked, and the 
later job in the queue cannot run because no available cores in the queue.


 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong

 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock accur, the job is 
 blocked, and the later job in the queue cannot run because no available cores 
 in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation

2015-03-29 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-3416:
--
Description: 
I submit a  big job, which has 500 maps and 350 reduce, to a 
queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 
100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, 
a map fails and retry, waiting for a core, while the 300 reduces are waiting 
for failed map to finish. So a deadlock occur. As a result, the job is blocked, 
and the later job in the queue cannot run because no available cores in the 
queue.


  was:
I submit a  big job, which has 500 maps and 350 reduce, to a 
queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 
100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, 
a map fails and retry, waiting for a core, while the 300 reduces are waiting 
for failed map to finish. So a deadlock accur, the job is blocked, and the 
later job in the queue cannot run because no available cores in the queue.



 deadlock in a job between map and reduce cores allocation 
 --

 Key: YARN-3416
 URL: https://issues.apache.org/jira/browse/YARN-3416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: mai shurong

 I submit a  big job, which has 500 maps and 350 reduce, to a 
 queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
 running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
 And then, a map fails and retry, waiting for a core, while the 300 reduces 
 are waiting for failed map to finish. So a deadlock occur. As a result, the 
 job is blocked, and the later job in the queue cannot run because no 
 available cores in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)