[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation
[ https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mai shurong updated YARN-3416: -- Attachment: queue_with_max333cores.png queue_with_max263cores.png queue_with_max163cores.png queue_with_max163cores.png : submit a job to a queue with max 163 cores queue_with_max263cores.png : submit a job to a queue with max 263 cores queue_with_max333cores.png : submit a job to a queue with max 333 cores deadlock in a job between map and reduce cores allocation -- Key: YARN-3416 URL: https://issues.apache.org/jira/browse/YARN-3416 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: mai shurong Priority: Critical Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, queue_with_max163cores.png, queue_with_max263cores.png, queue_with_max333cores.png I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation
[ https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mai shurong updated YARN-3416: -- Attachment: AM_log_head10.txt.gz AM_log_tail10.txt.gz head 10 lines and tail 10 lines of AM log of a deadlock job. deadlock in a job between map and reduce cores allocation -- Key: YARN-3416 URL: https://issues.apache.org/jira/browse/YARN-3416 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: mai shurong Priority: Critical Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation
[ https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3416: --- Priority: Critical (was: Major) deadlock in a job between map and reduce cores allocation -- Key: YARN-3416 URL: https://issues.apache.org/jira/browse/YARN-3416 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: mai shurong Priority: Critical I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation
[ https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mai shurong updated YARN-3416: -- Description: I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. I think there is the similar issue for memory of a queue . was: I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. deadlock in a job between map and reduce cores allocation -- Key: YARN-3416 URL: https://issues.apache.org/jira/browse/YARN-3416 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: mai shurong I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation
[ https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mai shurong updated YARN-3416: -- Description: I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock accur, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. deadlock in a job between map and reduce cores allocation -- Key: YARN-3416 URL: https://issues.apache.org/jira/browse/YARN-3416 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: mai shurong I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock accur, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3416) deadlock in a job between map and reduce cores allocation
[ https://issues.apache.org/jira/browse/YARN-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mai shurong updated YARN-3416: -- Description: I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. was: I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock accur, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. deadlock in a job between map and reduce cores allocation -- Key: YARN-3416 URL: https://issues.apache.org/jira/browse/YARN-3416 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: mai shurong I submit a big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)