[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081777#comment-14081777 ] Fengdong Yu commented on YARN-2372: --- I cannot find any more places on this issue by now. Thanks. There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2372) There is Chinese Characters in the FairScheduler's document
Fengdong Yu created YARN-2372: - Summary: There is Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Summary: There are Chinese Characters in the FairScheduler's document (was: There is Chinese Characters in the FairScheduler's document) There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu reassigned YARN-2372: - Assignee: Fengdong Yu There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch There are Chinese Characters in the FairScheduler's document Key: YARN-2372 URL: https://issues.apache.org/jira/browse/YARN-2372 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.1 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2276) Branch-2 cannot build
Fengdong Yu created YARN-2276: - Summary: Branch-2 cannot build Key: YARN-2276 URL: https://issues.apache.org/jira/browse/YARN-2276 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: Fengdong Yu [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/yufengdong/svn/letv-hadoop/hadoop-2.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18] error: cannot find symbol -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2276) Branch-2 cannot build
[ https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2276: -- Description: [ERROR] COMPILATION ERROR : [INFO] - [ERROR] hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18] error: cannot find symbol was: [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/yufengdong/svn/letv-hadoop/hadoop-2.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18] error: cannot find symbol Branch-2 cannot build - Key: YARN-2276 URL: https://issues.apache.org/jira/browse/YARN-2276 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: Fengdong Yu [ERROR] COMPILATION ERROR : [INFO] - [ERROR] hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18] error: cannot find symbol -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1998) Change the time zone on the RM web UI to the local time zone
[ https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985506#comment-13985506 ] Fengdong Yu commented on YARN-1998: --- oh,Thanks Tsuyoshi. I closed it as duplicate. Change the time zone on the RM web UI to the local time zone Key: YARN-1998 URL: https://issues.apache.org/jira/browse/YARN-1998 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1998.patch It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we should show the local time zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1998) Change the time zone on the Yarn UI to the local time zone
Fengdong Yu created YARN-1998: - Summary: Change the time zone on the Yarn UI to the local time zone Key: YARN-1998 URL: https://issues.apache.org/jira/browse/YARN-1998 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we should show the local time zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1998) Change the time zone on the Yarn UI to the local time zone
[ https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1998: -- Attachment: YARN-1998.patch Change the time zone on the Yarn UI to the local time zone -- Key: YARN-1998 URL: https://issues.apache.org/jira/browse/YARN-1998 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1998.patch It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we should show the local time zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1998) Change the time zone on the RM web UI to the local time zone
[ https://issues.apache.org/jira/browse/YARN-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1998: -- Summary: Change the time zone on the RM web UI to the local time zone (was: Change the time zone on the Yarn UI to the local time zone) Change the time zone on the RM web UI to the local time zone Key: YARN-1998 URL: https://issues.apache.org/jira/browse/YARN-1998 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1998.patch It shows GMT time zone for 'startTime' and 'finishTime' on the RM web UI, we should show the local time zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1991) Application state shoule be removed after deleted from StateStore
Fengdong Yu created YARN-1991: - Summary: Application state shoule be removed after deleted from StateStore Key: YARN-1991 URL: https://issues.apache.org/jira/browse/YARN-1991 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor FileSystemRMStateStore and ZKRMStateStore don't remove application state in memory after app state has been deleted in the store. MemoryRMStateStore has done it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1991) Application state shoule be removed after deleted from StateStore
[ https://issues.apache.org/jira/browse/YARN-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1991: -- Attachment: YARN-1991.patch Application state shoule be removed after deleted from StateStore - Key: YARN-1991 URL: https://issues.apache.org/jira/browse/YARN-1991 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1991.patch FileSystemRMStateStore and ZKRMStateStore don't remove application state in memory after app state has been deleted in the store. MemoryRMStateStore has done it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1991) Application state shoule be removed after deleted from StateStore
[ https://issues.apache.org/jira/browse/YARN-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1991: -- Attachment: (was: YARN-1991.patch) Application state shoule be removed after deleted from StateStore - Key: YARN-1991 URL: https://issues.apache.org/jira/browse/YARN-1991 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor FileSystemRMStateStore and ZKRMStateStore don't remove application state in memory after app state has been deleted in the store. MemoryRMStateStore has done it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1991) Application state shoule be removed after deleted from StateStore
[ https://issues.apache.org/jira/browse/YARN-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1991: -- Attachment: YARN-1991.patch Application state shoule be removed after deleted from StateStore - Key: YARN-1991 URL: https://issues.apache.org/jira/browse/YARN-1991 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1991.patch FileSystemRMStateStore and ZKRMStateStore don't remove application state in memory after app state has been deleted in the store. MemoryRMStateStore has done it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968030#comment-13968030 ] Fengdong Yu commented on YARN-1870: --- who can commit this? Thanks. FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Fengdong Yu Priority: Minor Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1870: -- Component/s: resourcemanager FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Fengdong Yu Priority: Minor Fix For: 2.4.1 Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1870: -- Affects Version/s: 2.4.0 FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Fengdong Yu Priority: Minor Fix For: 2.4.1 Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1870: -- Fix Version/s: 2.4.1 FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Fengdong Yu Priority: Minor Fix For: 2.4.1 Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1901) All tasks restart during RM failover on Hive
[ https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959815#comment-13959815 ] Fengdong Yu commented on YARN-1901: --- Hi [~oazwa], Can you search the mail list of yarn-dev, I had a mail for this issue. This issue is only for Hive jobs. It works well for general MR jobs.(only unfinished tasks restart, all finished tasks not re-run) All tasks restart during RM failover on Hive Key: YARN-1901 URL: https://issues.apache.org/jira/browse/YARN-1901 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu I built from trunk, and configured RM Ha, then I submitted a hive job. there are total 11 maps, then I stopped active RM when 6 maps finished. but Hive shows me all map tasks restat again. This is conflict with the design description. job progress: {code} 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 713.84 sec 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 722.83 sec 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 731.95 sec 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 744.17 sec 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 756.22 sec 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 762.4 sec 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 774.64 sec 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 786.49 sec 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 792.59 sec 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 807.58 sec 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 815.96 sec 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 823.83 sec 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 826.84 sec 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 832.16 sec 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 839.73 sec 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 844.45 sec 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 760.34 sec 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 213.81 sec 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 216.83 sec 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 229.15 sec 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 244.42 sec 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 247.31 sec 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 259.5 sec 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 274.72 sec 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 280.76 sec 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 292.9 sec 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 305.16 sec 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 314.21 sec 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 323.34 sec 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 335.6 sec 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 344.71 sec 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 353.8 sec 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 366.06 sec 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 375.2 sec 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 384.28 sec {code} I am using hive-0.12.0, and ZKRMStateRoot as RM store class. Hive using a simple external table(only one column). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1901) All tasks restart during RM failover on Hive
[ https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960515#comment-13960515 ] Fengdong Yu commented on YARN-1901: --- Yes, exactly duplicated, thanks, I've closed it. All tasks restart during RM failover on Hive Key: YARN-1901 URL: https://issues.apache.org/jira/browse/YARN-1901 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu I built from trunk, and configured RM Ha, then I submitted a hive job. there are total 11 maps, then I stopped active RM when 6 maps finished. but Hive shows me all map tasks restat again. This is conflict with the design description. job progress: {code} 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 713.84 sec 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 722.83 sec 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 731.95 sec 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 744.17 sec 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 756.22 sec 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 762.4 sec 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 774.64 sec 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 786.49 sec 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 792.59 sec 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 807.58 sec 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 815.96 sec 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 823.83 sec 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 826.84 sec 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 832.16 sec 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 839.73 sec 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 844.45 sec 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 760.34 sec 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 213.81 sec 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 216.83 sec 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 229.15 sec 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 244.42 sec 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 247.31 sec 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 259.5 sec 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 274.72 sec 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 280.76 sec 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 292.9 sec 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 305.16 sec 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 314.21 sec 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 323.34 sec 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 335.6 sec 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 344.71 sec 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 353.8 sec 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 366.06 sec 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 375.2 sec 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 384.28 sec {code} I am using hive-0.12.0, and ZKRMStateRoot as RM store class. Hive using a simple external table(only one column). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1901) All tasks restart during RM failover on Hive
[ https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu resolved YARN-1901. --- Resolution: Duplicate All tasks restart during RM failover on Hive Key: YARN-1901 URL: https://issues.apache.org/jira/browse/YARN-1901 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu I built from trunk, and configured RM Ha, then I submitted a hive job. there are total 11 maps, then I stopped active RM when 6 maps finished. but Hive shows me all map tasks restat again. This is conflict with the design description. job progress: {code} 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 713.84 sec 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 722.83 sec 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 731.95 sec 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 744.17 sec 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 756.22 sec 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 762.4 sec 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 774.64 sec 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 786.49 sec 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 792.59 sec 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 807.58 sec 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 815.96 sec 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 823.83 sec 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 826.84 sec 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 832.16 sec 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 839.73 sec 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 844.45 sec 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 760.34 sec 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 213.81 sec 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 216.83 sec 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 229.15 sec 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 244.42 sec 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 247.31 sec 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 259.5 sec 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 274.72 sec 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 280.76 sec 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 292.9 sec 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 305.16 sec 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 314.21 sec 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 323.34 sec 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 335.6 sec 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 344.71 sec 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 353.8 sec 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 366.06 sec 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 375.2 sec 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 384.28 sec {code} I am using hive-0.12.0, and ZKRMStateRoot as RM store class. Hive using a simple external table(only one column). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1901) All tasks restart during RM failover on Hive
Fengdong Yu created YARN-1901: - Summary: All tasks restart during RM failover on Hive Key: YARN-1901 URL: https://issues.apache.org/jira/browse/YARN-1901 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu I built from trunk, and configured RM Ha, then I submitted a hive job. there are total 11 maps, then I stopped active RM when 6 maps finished. but Hive shows me all map tasks restat again. This is conflict with the design description. job progress: {code} 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 713.84 sec 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 722.83 sec 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 731.95 sec 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 744.17 sec 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 756.22 sec 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 762.4 sec 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 774.64 sec 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 786.49 sec 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 792.59 sec 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 807.58 sec 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 815.96 sec 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 823.83 sec 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 826.84 sec 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 832.16 sec 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 839.73 sec 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 844.45 sec 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 760.34 sec 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 213.81 sec 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 216.83 sec 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 229.15 sec 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 244.42 sec 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 247.31 sec 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 259.5 sec 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 274.72 sec 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 280.76 sec 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 292.9 sec 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 305.16 sec 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 314.21 sec 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 323.34 sec 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 335.6 sec 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 344.71 sec 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 353.8 sec 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 366.06 sec 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 375.2 sec 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 384.28 sec {code} I am using hive-0.12.0, and ZKRMStateRoot as RM store class. Hive using a simple external table(only one column). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1889: -- Labels: reviewed (was: ) avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Labels: reviewed Attachments: YARN-1889.patch, YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after NodeUpdate processing:); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954945#comment-13954945 ] Fengdong Yu commented on YARN-1889: --- The new patch looks good to me. avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Labels: reviewed Attachments: YARN-1889.patch, YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after NodeUpdate processing:); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951778#comment-13951778 ] Fengdong Yu commented on YARN-1889: --- Hi, Zhiguo, what my comments you addressed in your new patch? I cannot see any change. 1. there are still tabs in the patch 2. move following initialization into the constructor {code} + private Priority priority = recordFactory.newRecordInstance(Priority.class); + private ResourceWeights resourceWeights = new ResourceWeights(); {code} 3: As Sandy said, don't use recordFactory.newRecordInstance(Priority.class), instead, use Priority.newInstance(1) 4: so remove priority.setPriority(1); {code} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. -Priority p = recordFactory.newRecordInstance(Priority.class); -p.setPriority(1); -return p; +priority.setPriority(1); +return priority; } {code} 5: please rename to getResourceWeights() {code} + public ResourceWeights getResourceWeightsObject() { + return resourceWeights; + } {code} avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Attachments: YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950749#comment-13950749 ] Fengdong Yu commented on YARN-1889: --- Good catch, Zhiguo. can you add some test cases in your patch? please replace 'tab' in your code with 'space'. {code} + private Priority priority = recordFactory.newRecordInstance(Priority.class); + private ResourceWeights resourceWeights = new ResourceWeights(); {code} can you add these to the constructor? {code} + public ResourceWeights getResourceWeightsObject() { + return resourceWeights; + } {code} It would be better for the name getResourceWeights() avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Attachments: YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948727#comment-13948727 ] Fengdong Yu commented on YARN-1696: --- The document is really good. two minor comments: {code} +another RM is automatically elected to be the Active and takes over. Note +that, there is no need to run a separate ZKFC daemon as is the case for +HDFS. {code} A little bit unclear, we cannot suppose HDFS HA is enabled. so It can be Not that, RM automatic failover share ZKFC with HDFS if your HDFS HA enabled, so there is no need to run a separate ZKFC daemon here. {code} +** Web Services + + The web services automatically redirect to the Active. {code} web services are too general, It could be confused for a new Yarner. so just changed to RM web UI services or some meaningful others. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-1696.2.patch, yarn-1696-1.patch Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947515#comment-13947515 ] Fengdong Yu commented on YARN-1870: --- [~vinodkv], can you add me as YARN contributor? I am HDFS contributor now, so I don't think we also need send mail to the secretary. FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947575#comment-13947575 ] Fengdong Yu commented on YARN-1878: --- +1 for the patch. HDFS failover is also 5s by default. Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945968#comment-13945968 ] Fengdong Yu commented on YARN-1870: --- Good catch [~yuzhih...@gmail.com], I've upload a simple patch to cover it. FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1870: -- Attachment: YARN-1870.patch FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1870: -- Attachment: YARN-1870.patch FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946012#comment-13946012 ] Fengdong Yu commented on YARN-1870: --- [~yuzhih...@gmail.com], can you add me as YARN contributor? FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1851) Unable to parse launch time from job history file
Fengdong Yu created YARN-1851: - Summary: Unable to parse launch time from job history file Key: YARN-1851 URL: https://issues.apache.org/jira/browse/YARN-1851 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Priority: Minor Fix For: 2.4.0 when job complete, there are WARN complains in the log: {code} 2014-03-19 13:31:10,036 WARN org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils: Unable to parse launch time from job history file job_1395204058904_0003-1395206473646-root-test_one_word-1395206966214-4-2-SUCCEEDED-root.test-queue-1395206480070.jhist : java.lang.NumberFormatException: For input string: queue {code} because there is '-' in the queue name 'test-queue', we split the job history file name by '-', and get the ninth item as job start time. FileNameIndexUtils.java {code} private static final int JOB_START_TIME_INDEX = 9; {code} but there is another potential issue: if I also include '-' in the job name(test_one_world in this case), there are all misunderstand. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1851) Unable to parse launch time from job history file
[ https://issues.apache.org/jira/browse/YARN-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1851: -- Description: when job complete, there are WARN complains in the log: {code} 2014-03-19 13:31:10,036 WARN org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils: Unable to parse launch time from job history file job_1395204058904_0003-1395206473646-root-test_one_word-1395206966214-4-2-SUCCEEDED-root.test-queue-1395206480070.jhist : java.lang.NumberFormatException: For input string: queue {code} because there is (-) in the queue name 'test-queue', we split the job history file name by (-), and get the ninth item as job start time. FileNameIndexUtils.java {code} private static final int JOB_START_TIME_INDEX = 9; {code} but there is another potential issue: if I also include '-' in the job name(test_one_world in this case), there are all misunderstand. was: when job complete, there are WARN complains in the log: {code} 2014-03-19 13:31:10,036 WARN org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils: Unable to parse launch time from job history file job_1395204058904_0003-1395206473646-root-test_one_word-1395206966214-4-2-SUCCEEDED-root.test-queue-1395206480070.jhist : java.lang.NumberFormatException: For input string: queue {code} because there is - in the queue name 'test-queue', we split the job history file name by -, and get the ninth item as job start time. FileNameIndexUtils.java {code} private static final int JOB_START_TIME_INDEX = 9; {code} but there is another potential issue: if I also include '-' in the job name(test_one_world in this case), there are all misunderstand. Unable to parse launch time from job history file - Key: YARN-1851 URL: https://issues.apache.org/jira/browse/YARN-1851 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Fengdong Yu Priority: Minor Fix For: 2.4.0 when job complete, there are WARN complains in the log: {code} 2014-03-19 13:31:10,036 WARN org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils: Unable to parse launch time from job history file job_1395204058904_0003-1395206473646-root-test_one_word-1395206966214-4-2-SUCCEEDED-root.test-queue-1395206480070.jhist : java.lang.NumberFormatException: For input string: queue {code} because there is (-) in the queue name 'test-queue', we split the job history file name by (-), and get the ninth item as job start time. FileNameIndexUtils.java {code} private static final int JOB_START_TIME_INDEX = 9; {code} but there is another potential issue: if I also include '-' in the job name(test_one_world in this case), there are all misunderstand. -- This message was sent by Atlassian JIRA (v6.2#6252)