[jira] [Updated] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-400:


Attachment: YARN-400-branch-0.23.patch

Thanks for the review, Tom.  Here's the patch for branch-0.23.

 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-20 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582247#comment-13582247
 ] 

Thomas Graves commented on YARN-400:


+1 Thanks Jason!

 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-100) container-executor should deal with stdout, stderr better

2013-02-20 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-100:
-

Labels: usability  (was: )

 container-executor should deal with stdout, stderr better
 -

 Key: YARN-100
 URL: https://issues.apache.org/jira/browse/YARN-100
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Priority: Minor
  Labels: usability

 container-executor.c contains the following code:
 {code}
   fclose(stdin);
   fflush(LOGFILE);
   if (LOGFILE != stdout) {
 fclose(stdout);
   }
   if (ERRORFILE != stderr) {
 fclose(stderr);
   }
   if (chdir(primary_app_dir) != 0) {
 fprintf(LOGFILE, Failed to chdir to app dir - %s\n, strerror(errno));
 return -1;
   }
   execvp(args[0], args);
 {code}
 Whenever you open a new file descriptor, its number is the lowest available 
 number.  So if {{stdout}} (fd number 1) has been closed, and you do 
 open(/my/important/file), you'll get assigned file descriptor 1.  This 
 means that any printf statements in the program will be now printing to 
 /my/important/file.  Oops!
 The correct way to get rid of stdin, stdout, or stderr is not to close them, 
 but to make them point to /dev/null.  {{dup2}} can be used for this purpose.
 It looks like LOGFILE and ERRORFILE are always set to stdout and stderr at 
 the moment.  However, this is a latent bug that should be fixed in case these 
 are ever made configurable (which seems to have been the intent).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start

2013-02-20 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582482#comment-13582482
 ] 

Jonathan Eagles commented on YARN-236:
--

+1. Verified this condition still exists and patch does redirect to the RM app 
webpage. Thanks, Jason.

 RM should point tracking URL to RM web page when app fails to start
 ---

 Key: YARN-236
 URL: https://issues.apache.org/jira/browse/YARN-236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Attachments: YARN-236.patch


 Similar to YARN-165, the RM should redirect the tracking URL to the specific 
 app page on the RM web UI when the application fails to start.  For example, 
 if the AM completely fails to start due to bad AM config or bad job config 
 like invalid queuename, then the user gets the unhelpful The requested 
 application exited before setting a tracking URL.
 Usually the diagnostic string on the RM app page has something useful, so we 
 might as well point there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582515#comment-13582515
 ] 

Hudson commented on YARN-236:
-

Integrated in Hadoop-trunk-Commit #3368 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3368/])
YARN-236. RM should point tracking URL to RM web page when app fails to 
start (Jason Lowe via jeagles) (Revision 1448406)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448406
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java


 RM should point tracking URL to RM web page when app fails to start
 ---

 Key: YARN-236
 URL: https://issues.apache.org/jira/browse/YARN-236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-236.patch


 Similar to YARN-165, the RM should redirect the tracking URL to the specific 
 app page on the RM web UI when the application fails to start.  For example, 
 if the AM completely fails to start due to bad AM config or bad job config 
 like invalid queuename, then the user gets the unhelpful The requested 
 application exited before setting a tracking URL.
 Usually the diagnostic string on the RM app page has something useful, so we 
 might as well point there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-399) Add an out of band heartbeat damper

2013-02-20 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-399:
---

Attachment: YARN-399.PATCH

patch that I started. Hasn't been tested, attaching for now and will get back 
to later.

 Add an out of band heartbeat damper
 ---

 Key: YARN-399
 URL: https://issues.apache.org/jira/browse/YARN-399
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.6
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-399.PATCH


 We are seeing issues with the scheduler queue backing up on the RM. We have 
 the nodemanager heartbeats set at 5 seconds which should be more then long 
 enough for the number of apps we are running.  We believe this is due to the 
 out of band heartbeats of the nodemanager coming to soon when we have jobs 
 with lots of containers that finish quickly.
 To help with that we could add an out of band heartbeat damper to the 
 nodemanager similar to what 1.X Tasktrackers have.  MAPREDUCE-2355 added it 
 in 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-20 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-365:
---

Attachment: YARN-365.7.patch

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira