date:20130221

[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583095#comment-13583095
 ] 

Hudson commented on YARN-400:
-

Integrated in Hadoop-Yarn-trunk #134 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/134/])
YARN-400. RM can return null application resource usage report leading to 
NPE in client (Jason Lowe via tgraves) (Revision 1448241)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448241
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start

2013-02-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583097#comment-13583097
 ] 

Hudson commented on YARN-236:
-

Integrated in Hadoop-Yarn-trunk #134 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/134/])
YARN-236. RM should point tracking URL to RM web page when app fails to 
start (Jason Lowe via jeagles) (Revision 1448406)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448406
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java


 RM should point tracking URL to RM web page when app fails to start
 ---

 Key: YARN-236
 URL: https://issues.apache.org/jira/browse/YARN-236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-236.patch


 Similar to YARN-165, the RM should redirect the tracking URL to the specific 
 app page on the RM web UI when the application fails to start.  For example, 
 if the AM completely fails to start due to bad AM config or bad job config 
 like invalid queuename, then the user gets the unhelpful The requested 
 application exited before setting a tracking URL.
 Usually the diagnostic string on the RM app page has something useful, so we 
 might as well point there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583154#comment-13583154
 ] 

Hudson commented on YARN-400:
-

Integrated in Hadoop-Hdfs-0.23-Build #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/532/])
YARN-400. RM can return null application resource usage report leading to 
NPE in client (Jason Lowe via tgraves) (Revision 1448244)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448244
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start

2013-02-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583155#comment-13583155
 ] 

Hudson commented on YARN-236:
-

Integrated in Hadoop-Hdfs-0.23-Build #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/532/])
YARN-236. RM should point tracking URL to RM web page when app fails to 
start (Jason Lowe via jeagles) (Revision 1448411)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448411
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java


 RM should point tracking URL to RM web page when app fails to start
 ---

 Key: YARN-236
 URL: https://issues.apache.org/jira/browse/YARN-236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-236.patch


 Similar to YARN-165, the RM should redirect the tracking URL to the specific 
 app page on the RM web UI when the application fails to start.  For example, 
 if the AM completely fails to start due to bad AM config or bad job config 
 like invalid queuename, then the user gets the unhelpful The requested 
 application exited before setting a tracking URL.
 Usually the diagnostic string on the RM app page has something useful, so we 
 might as well point there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583174#comment-13583174
 ] 

Hudson commented on YARN-400:
-

Integrated in Hadoop-Mapreduce-trunk #1351 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/])
YARN-400. RM can return null application resource usage report leading to 
NPE in client (Jason Lowe via tgraves) (Revision 1448241)

 Result = FAILURE
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448241
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-380) yarn node -status prints Last-Last-Node-Status

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-380:
-

Labels: usability  (was: )

 yarn node -status prints Last-Last-Node-Status
 --

 Key: YARN-380
 URL: https://issues.apache.org/jira/browse/YARN-380
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
  Labels: usability

 I assume the Last-Last-NodeStatus is a typo and it should just be 
 Last-Node-Status.
 $ yarn node -status foo.com:8041
 Node Report : 
 Node-Id : foo.com:8041
 Rack : /10.10.10.0
 Node-State : RUNNING
 Node-Http-Address : foo.com:8042
 Health-Status(isNodeHealthy) : true
 Last-Last-Health-Update : 1360118400219
 Health-Report : 
 Containers : 0
 Memory-Used : 0M
 Memory-Capacity : 24576

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-379) yarn [node,application] command print logger info messages

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-379:
-

Labels: usability  (was: )

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
  Labels: usability

 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-198:
-

Labels: usability  (was: )

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: Senthil V Kumar
Priority: Minor
  Labels: usability

 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-410) Miscellaneous web UI issues

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-410:
-

Labels: usability  (was: )

 Miscellaneous web UI issues
 ---

 Key: YARN-410
 URL: https://issues.apache.org/jira/browse/YARN-410
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
  Labels: usability

 We need to fix the following issues on YARN web-UI:
  - Remove the Note column from the application list. When a failure 
 happens, this Note spoils the table layout.
  - When the Application is still not running, the Tracking UI should be title 
 UNASSIGNED, for some reason it is titled ApplicationMaster but 
 (correctly) links to #.
  - The per-application page has all the RM related information like version, 
 start-time etc. Must be some accidental change by one of the patches.
  - The diagnostics for a failed app on the per-application page don't retain 
 new lines and wrap'em around - looks hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-410) Miscellaneous web UI issues

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved MAPREDUCE-3152 to YARN-410:
-

  Component/s: (was: mrv2)
Fix Version/s: (was: 0.24.0)
 Assignee: (was: Subroto Sanyal)
Affects Version/s: (was: 0.23.0)
  Key: YARN-410  (was: MAPREDUCE-3152)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Miscellaneous web UI issues
 ---

 Key: YARN-410
 URL: https://issues.apache.org/jira/browse/YARN-410
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli

 We need to fix the following issues on YARN web-UI:
  - Remove the Note column from the application list. When a failure 
 happens, this Note spoils the table layout.
  - When the Application is still not running, the Tracking UI should be title 
 UNASSIGNED, for some reason it is titled ApplicationMaster but 
 (correctly) links to #.
  - The per-application page has all the RM related information like version, 
 start-time etc. Must be some accidental change by one of the patches.
  - The diagnostics for a failed app on the per-application page don't retain 
 new lines and wrap'em around - looks hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-69) RM should throw different exceptions for while querying app/node/queue

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-69:


Issue Type: Sub-task  (was: Bug)
Parent: YARN-386

 RM should throw different exceptions for while querying app/node/queue
 --

 Key: YARN-69
 URL: https://issues.apache.org/jira/browse/YARN-69
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 We should distinguish the exceptions for absent app/node/queue, illegally 
 accessed app/node/queue etc. Today everything is a {{YarnRemoteException}}. 
 We should extend {{YarnRemoteException}} to add {{NotFoundException}}, 
 {{AccessControlException}} etc. Today, {{AccessControlException}} exists but 
 not as part of the protocol descriptions (i.e. only available to Java).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-411) Per-state RM app-pages should have search ala JHS pages

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved MAPREDUCE-3778 to YARN-411:
-

  Component/s: (was: webapps)
   (was: mrv2)
Affects Version/s: (was: 0.23.0)
  Key: YARN-411  (was: MAPREDUCE-3778)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Per-state RM app-pages should have search ala JHS pages
 ---

 Key: YARN-411
 URL: https://issues.apache.org/jira/browse/YARN-411
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Roger Hoover (JIRA)

Roger Hoover created YARN-412:
-

 Summary: FifoScheduler incorrectly checking for node locality
 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor


In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
data is local to a node by searching for the nodeAddress of the node in the set 
of outstanding requests for the app.  This seems to be incorrect as it should 
be checking hostname instead.  The offending line of code is 455:

application.getResourceRequest(priority, node.getRMNode().getNodeAddress());

Requests are formated by hostname (e.g. host1.foo.com) where as node addresses 
are a concatenation of hostname and command port (e.g. host1.foo.com:1234)

In the CapacityScheduler, it's done using hostname.  See 
LeafQueue.assignNodeLocalContainers, line 1129

application.getResourceRequest(priority, node.getHostName());

Note that this but does not affect the actual scheduling decisions by the 
FifoScheduler because even though it incorrect determines that a request is not 
local to the node, it will still schedule the request immediately because it's 
rack-local.  However, this bug may be adversely affecting the reporting of job 
status by underreporting the number of tasks that were node local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Roger Hoover (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

Please review this patch for the fix plus a unit test case

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) where as node 
 addresses are a concatenation of hostname and command port (e.g. 
 host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this but does not affect the actual scheduling decisions by the 
 FifoScheduler because even though it incorrect determines that a request is 
 not local to the node, it will still schedule the request immediately because 
 it's rack-local.  However, this bug may be adversely affecting the reporting 
 of job status by underreporting the number of tasks that were node local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Roger Hoover (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

Added a timeout on the unit test

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583522#comment-13583522
 ] 

Hadoop QA commented on YARN-412:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570348/YARN-412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/416//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/416//console

This message is automatically generated.

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

2013-02-21 Thread Sandy Ryza (JIRA)

Sandy Ryza created YARN-413:
---

 Summary: With log aggregation on, nodemanager dies on startup if 
it can't connect to HDFS
 Key: YARN-413
 URL: https://issues.apache.org/jira/browse/YARN-413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza


If log aggregation is on, when the nodemanager starts up, it tries to create 
the remote log directory.  If this fails, it kills itself.  It doesn't seem 
like turning log aggregation on should ever cause the nodemanager to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-408) Capacity Scheduler delay scheduling should not be disabled by default

2013-02-21 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583625#comment-13583625
 ] 

Mayank Bansal commented on YARN-408:


Yeah sure.

There are two intents to change this value 

1. Make it enable by default
2. The algorithm which we use to check for the applications to get the 
scheduling opportunity is based on heart beat from the NM , hence if we just 
use the number of racks it will not be much of the value add for actually to 
achieve the node locality. The intent of using the number of delay is to 
actually at least wait for one heart beat from each node in the cluster and 
then move task to next locality level.

I defaulted it to number of nodes in one rack generally.

Thanks,
Mayank


 Capacity Scheduler delay scheduling should not be disabled by default
 -

 Key: YARN-408
 URL: https://issues.apache.org/jira/browse/YARN-408
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN-408-trunk.patch


 Capacity Scheduler delay scheduling should not be disabled by default.
 Enabling it to number of nodes in one rack.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

2013-02-21 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583660#comment-13583660
 ] 

Sandy Ryza commented on YARN-413:
-

2013-02-21 13:27:24,307 FATAL 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
org.apache.hadoop.yarn.YarnException: Failed to Start 
org.apache.hadoop.yarn.server.nodemanager.NodeManager
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
Caused by: org.apache.hadoop.yarn.YarnException: Failed to Start 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:248)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 3 more
Caused by: org.apache.hadoop.yarn.YarnException: Failed to create remoteLogDir 
[/tmp/logs]
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:207)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.start(LogAggregationService.java:132)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 5 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
 Cannot create directory /tmp/logs. Name node is in safe mode.
The reported blocks 7 has reached the threshold 0.9990 of total blocks 7. Safe 
mode will be turned off automatically in 25 seconds.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3067)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3045)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3024)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:667)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:468)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40995)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:482)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1778)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1774)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1772)

at org.apache.hadoop.ipc.Client.call(Client.java:1237)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:163)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:82)
at $Proxy9.mkdirs(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:450)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2115)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2086)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:540)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:204)
... 7 more
2013-02-21 13:27:24,308 INFO org.apache.hadoop.ipc.Server: Stopping server on 
47223
2013-02-21 13:27:24,308 INFO

[jira] [Resolved] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

2013-02-21 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-413.
-

Resolution: Duplicate

 With log aggregation on, nodemanager dies on startup if it can't connect to 
 HDFS
 

 Key: YARN-413
 URL: https://issues.apache.org/jira/browse/YARN-413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza

 If log aggregation is on, when the nodemanager starts up, it tries to create 
 the remote log directory.  If this fails, it kills itself.  It doesn't seem 
 like turning log aggregation on should ever cause the nodemanager to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-02-21 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583713#comment-13583713
]

Bikas Saha commented on YARN-392:
-

From what I understand this seems to be tangentially going down the path of
the discussion that happened in YARN-371. The crucial point is that the YARN
resource scheduler is *not* a task scheduler. So introducing concepts that
directly or indirectly make it do task scheduling would be inconsistent with
the design. Its a coarse grained resource allocator that gives the app
containers that represent chunks of resources using which the app can schedule
its tasks. Different versions of the scheduler change the way the resource
sharing is being done. Fair/Capacity or otherwise. Ideally we should have only
1 scheduler that has hooks to change the sharing policy. The code kind off
reflects that because there is so much common code/logic between both
implementations.

Unfortunately, in both the Fair and Capacity Scheduler the implementations have
mixed up 1) decision to allocate at and below a given topology level [say *
level] with 2) whether there are resource requests at that level. E.g. when
allocation cycle is started for an app, the logic starts at the * and checks if
the resource requests count 0. If yes then it goes into racks and then nodes.
Which means that if an application wants resources only at a node then it has
to create requests at the rack and * level too. This is because locality
relaxation has gotten mixed up with being schedulable, if you catch my drift.
My strong belief is that if we can fix this overload then we wont need to fix
this jira. However I can see that fixing the overload will be a very
complicated knot to untie and perhaps impossible to do now because it may be
inextricably linked with the API. Which is why I created this jira.

Now, if the problem is the * overloaded that I describe above, then the problem
is the entanglement of delay scheduling (for locality). Here is an alternative
proposal that addresses this problem. Lets make the delay of the delay
scheduling specifiable by the application. So an application can specify how
long to wait before relaxing its node requests to rack and *. When an app wants
containers on specific nodes it basically means that it does not want the RM to
automatically relax its locality - thus specifying a large value for the delay.
The end result being allocation on specific nodes if resources become available
on those nodes. This also serves as a useful extension of delay scheduling.
Short apps can be aggressive in relaxing locality while long+large jobs can be
more conservative in trading of scheduling speed with network IO.
The catch in the proposal is that such requests have to be made at a different
priority level. Resource requests at the same priority level get aggregated and
we dont want to aggregate relaxable resource requests with non-relaxable
resource requests. I think this is a good thing to do anyways because it makes
the application think and decide which kind of tasks it needs to get running
first.

An extension of this approach also ties in nicely with the API enhancement
suggested by YARN-394. The RM could actually inform the app that it has not
been able to allocate a resource request on a node and the time limit has
elapsed. At which point, the app could cancel that request and ask for an
alternative set of nodes. I agree I am hand-waving in this paragraph.

Thoughts?

Make it possible to schedule to specific nodes without dropping locality

Key: YARN-392
URL: https://issues.apache.org/jira/browse/YARN-392
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
Attachments: YARN-392.patch

Currently its not possible to specify scheduling requests for specific nodes
and nowhere else. The RM automatically relaxes locality to rack and * and
assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

47 matches

Mail list logo