[jira] [Updated] (YARN-400) RM can return null application resource usage report leading to NPE in client
[ https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-400: Attachment: YARN-400-branch-0.23.patch Thanks for the review, Tom. Here's the patch for branch-0.23. RM can return null application resource usage report leading to NPE in client - Key: YARN-400 URL: https://issues.apache.org/jira/browse/YARN-400 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-400-branch-0.23.patch, YARN-400.patch RMAppImpl.createAndGetApplicationReport can return a report with a null resource usage report if full access to the app is allowed but the application has no current attempt. This leads to NPEs in client code that assumes an app report will always have at least an empty resource usage report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client
[ https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582247#comment-13582247 ] Thomas Graves commented on YARN-400: +1 Thanks Jason! RM can return null application resource usage report leading to NPE in client - Key: YARN-400 URL: https://issues.apache.org/jira/browse/YARN-400 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-400-branch-0.23.patch, YARN-400.patch RMAppImpl.createAndGetApplicationReport can return a report with a null resource usage report if full access to the app is allowed but the application has no current attempt. This leads to NPEs in client code that assumes an app report will always have at least an empty resource usage report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-100) container-executor should deal with stdout, stderr better
[ https://issues.apache.org/jira/browse/YARN-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-100: - Labels: usability (was: ) container-executor should deal with stdout, stderr better - Key: YARN-100 URL: https://issues.apache.org/jira/browse/YARN-100 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.1-alpha Reporter: Colin Patrick McCabe Priority: Minor Labels: usability container-executor.c contains the following code: {code} fclose(stdin); fflush(LOGFILE); if (LOGFILE != stdout) { fclose(stdout); } if (ERRORFILE != stderr) { fclose(stderr); } if (chdir(primary_app_dir) != 0) { fprintf(LOGFILE, Failed to chdir to app dir - %s\n, strerror(errno)); return -1; } execvp(args[0], args); {code} Whenever you open a new file descriptor, its number is the lowest available number. So if {{stdout}} (fd number 1) has been closed, and you do open(/my/important/file), you'll get assigned file descriptor 1. This means that any printf statements in the program will be now printing to /my/important/file. Oops! The correct way to get rid of stdin, stdout, or stderr is not to close them, but to make them point to /dev/null. {{dup2}} can be used for this purpose. It looks like LOGFILE and ERRORFILE are always set to stdout and stderr at the moment. However, this is a latent bug that should be fixed in case these are ever made configurable (which seems to have been the intent). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start
[ https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582482#comment-13582482 ] Jonathan Eagles commented on YARN-236: -- +1. Verified this condition still exists and patch does redirect to the RM app webpage. Thanks, Jason. RM should point tracking URL to RM web page when app fails to start --- Key: YARN-236 URL: https://issues.apache.org/jira/browse/YARN-236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Jason Lowe Assignee: Jason Lowe Labels: usability Attachments: YARN-236.patch Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful The requested application exited before setting a tracking URL. Usually the diagnostic string on the RM app page has something useful, so we might as well point there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start
[ https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582515#comment-13582515 ] Hudson commented on YARN-236: - Integrated in Hadoop-trunk-Commit #3368 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3368/]) YARN-236. RM should point tracking URL to RM web page when app fails to start (Jason Lowe via jeagles) (Revision 1448406) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448406 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java RM should point tracking URL to RM web page when app fails to start --- Key: YARN-236 URL: https://issues.apache.org/jira/browse/YARN-236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Jason Lowe Assignee: Jason Lowe Labels: usability Fix For: 3.0.0, 0.23.7, 2.0.4-beta Attachments: YARN-236.patch Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful The requested application exited before setting a tracking URL. Usually the diagnostic string on the RM app page has something useful, so we might as well point there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-399) Add an out of band heartbeat damper
[ https://issues.apache.org/jira/browse/YARN-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-399: --- Attachment: YARN-399.PATCH patch that I started. Hasn't been tested, attaching for now and will get back to later. Add an out of band heartbeat damper --- Key: YARN-399 URL: https://issues.apache.org/jira/browse/YARN-399 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.6 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-399.PATCH We are seeing issues with the scheduler queue backing up on the RM. We have the nodemanager heartbeats set at 5 seconds which should be more then long enough for the number of apps we are running. We believe this is due to the out of band heartbeats of the nodemanager coming to soon when we have jobs with lots of containers that finish quickly. To help with that we could add an out of band heartbeat damper to the nodemanager similar to what 1.X Tasktrackers have. MAPREDUCE-2355 added it in 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-365: --- Attachment: YARN-365.7.patch Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira