[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583095#comment-13583095
 ] 

Hudson commented on YARN-400:
-

Integrated in Hadoop-Yarn-trunk #134 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/134/])
YARN-400. RM can return null application resource usage report leading to 
NPE in client (Jason Lowe via tgraves) (Revision 1448241)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448241
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583097#comment-13583097
 ] 

Hudson commented on YARN-236:
-

Integrated in Hadoop-Yarn-trunk #134 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/134/])
YARN-236. RM should point tracking URL to RM web page when app fails to 
start (Jason Lowe via jeagles) (Revision 1448406)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448406
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java


 RM should point tracking URL to RM web page when app fails to start
 ---

 Key: YARN-236
 URL: https://issues.apache.org/jira/browse/YARN-236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-236.patch


 Similar to YARN-165, the RM should redirect the tracking URL to the specific 
 app page on the RM web UI when the application fails to start.  For example, 
 if the AM completely fails to start due to bad AM config or bad job config 
 like invalid queuename, then the user gets the unhelpful The requested 
 application exited before setting a tracking URL.
 Usually the diagnostic string on the RM app page has something useful, so we 
 might as well point there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583154#comment-13583154
 ] 

Hudson commented on YARN-400:
-

Integrated in Hadoop-Hdfs-0.23-Build #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/532/])
YARN-400. RM can return null application resource usage report leading to 
NPE in client (Jason Lowe via tgraves) (Revision 1448244)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448244
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583155#comment-13583155
 ] 

Hudson commented on YARN-236:
-

Integrated in Hadoop-Hdfs-0.23-Build #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/532/])
YARN-236. RM should point tracking URL to RM web page when app fails to 
start (Jason Lowe via jeagles) (Revision 1448411)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448411
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java


 RM should point tracking URL to RM web page when app fails to start
 ---

 Key: YARN-236
 URL: https://issues.apache.org/jira/browse/YARN-236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.4
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-236.patch


 Similar to YARN-165, the RM should redirect the tracking URL to the specific 
 app page on the RM web UI when the application fails to start.  For example, 
 if the AM completely fails to start due to bad AM config or bad job config 
 like invalid queuename, then the user gets the unhelpful The requested 
 application exited before setting a tracking URL.
 Usually the diagnostic string on the RM app page has something useful, so we 
 might as well point there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-400) RM can return null application resource usage report leading to NPE in client

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583174#comment-13583174
 ] 

Hudson commented on YARN-400:
-

Integrated in Hadoop-Mapreduce-trunk #1351 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/])
YARN-400. RM can return null application resource usage report leading to 
NPE in client (Jason Lowe via tgraves) (Revision 1448241)

 Result = FAILURE
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448241
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 RM can return null application resource usage report leading to NPE in client
 -

 Key: YARN-400
 URL: https://issues.apache.org/jira/browse/YARN-400
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: YARN-400-branch-0.23.patch, YARN-400.patch


 RMAppImpl.createAndGetApplicationReport can return a report with a null 
 resource usage report if full access to the app is allowed but the 
 application has no current attempt.  This leads to NPEs in client code that 
 assumes an app report will always have at least an empty resource usage 
 report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-380) yarn node -status prints Last-Last-Node-Status

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-380:
-

Labels: usability  (was: )

 yarn node -status prints Last-Last-Node-Status
 --

 Key: YARN-380
 URL: https://issues.apache.org/jira/browse/YARN-380
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
  Labels: usability

 I assume the Last-Last-NodeStatus is a typo and it should just be 
 Last-Node-Status.
 $ yarn node -status foo.com:8041
 Node Report : 
 Node-Id : foo.com:8041
 Rack : /10.10.10.0
 Node-State : RUNNING
 Node-Http-Address : foo.com:8042
 Health-Status(isNodeHealthy) : true
 Last-Last-Health-Update : 1360118400219
 Health-Report : 
 Containers : 0
 Memory-Used : 0M
 Memory-Capacity : 24576

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-379) yarn [node,application] command print logger info messages

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-379:
-

Labels: usability  (was: )

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
  Labels: usability

 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-198:
-

Labels: usability  (was: )

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: Senthil V Kumar
Priority: Minor
  Labels: usability

 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-410) Miscellaneous web UI issues

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-410:
-

Labels: usability  (was: )

 Miscellaneous web UI issues
 ---

 Key: YARN-410
 URL: https://issues.apache.org/jira/browse/YARN-410
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
  Labels: usability

 We need to fix the following issues on YARN web-UI:
  - Remove the Note column from the application list. When a failure 
 happens, this Note spoils the table layout.
  - When the Application is still not running, the Tracking UI should be title 
 UNASSIGNED, for some reason it is titled ApplicationMaster but 
 (correctly) links to #.
  - The per-application page has all the RM related information like version, 
 start-time etc. Must be some accidental change by one of the patches.
  - The diagnostics for a failed app on the per-application page don't retain 
 new lines and wrap'em around - looks hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-410) Miscellaneous web UI issues

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved MAPREDUCE-3152 to YARN-410:
-

  Component/s: (was: mrv2)
Fix Version/s: (was: 0.24.0)
 Assignee: (was: Subroto Sanyal)
Affects Version/s: (was: 0.23.0)
  Key: YARN-410  (was: MAPREDUCE-3152)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Miscellaneous web UI issues
 ---

 Key: YARN-410
 URL: https://issues.apache.org/jira/browse/YARN-410
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli

 We need to fix the following issues on YARN web-UI:
  - Remove the Note column from the application list. When a failure 
 happens, this Note spoils the table layout.
  - When the Application is still not running, the Tracking UI should be title 
 UNASSIGNED, for some reason it is titled ApplicationMaster but 
 (correctly) links to #.
  - The per-application page has all the RM related information like version, 
 start-time etc. Must be some accidental change by one of the patches.
  - The diagnostics for a failed app on the per-application page don't retain 
 new lines and wrap'em around - looks hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-69) RM should throw different exceptions for while querying app/node/queue

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-69:


Issue Type: Sub-task  (was: Bug)
Parent: YARN-386

 RM should throw different exceptions for while querying app/node/queue
 --

 Key: YARN-69
 URL: https://issues.apache.org/jira/browse/YARN-69
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 We should distinguish the exceptions for absent app/node/queue, illegally 
 accessed app/node/queue etc. Today everything is a {{YarnRemoteException}}. 
 We should extend {{YarnRemoteException}} to add {{NotFoundException}}, 
 {{AccessControlException}} etc. Today, {{AccessControlException}} exists but 
 not as part of the protocol descriptions (i.e. only available to Java).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-411) Per-state RM app-pages should have search ala JHS pages

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved MAPREDUCE-3778 to YARN-411:
-

  Component/s: (was: webapps)
   (was: mrv2)
Affects Version/s: (was: 0.23.0)
  Key: YARN-411  (was: MAPREDUCE-3778)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Per-state RM app-pages should have search ala JHS pages
 ---

 Key: YARN-411
 URL: https://issues.apache.org/jira/browse/YARN-411
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Roger Hoover (JIRA)
Roger Hoover created YARN-412:
-

 Summary: FifoScheduler incorrectly checking for node locality
 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor


In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
data is local to a node by searching for the nodeAddress of the node in the set 
of outstanding requests for the app.  This seems to be incorrect as it should 
be checking hostname instead.  The offending line of code is 455:

application.getResourceRequest(priority, node.getRMNode().getNodeAddress());

Requests are formated by hostname (e.g. host1.foo.com) where as node addresses 
are a concatenation of hostname and command port (e.g. host1.foo.com:1234)

In the CapacityScheduler, it's done using hostname.  See 
LeafQueue.assignNodeLocalContainers, line 1129

application.getResourceRequest(priority, node.getHostName());

Note that this but does not affect the actual scheduling decisions by the 
FifoScheduler because even though it incorrect determines that a request is not 
local to the node, it will still schedule the request immediately because it's 
rack-local.  However, this bug may be adversely affecting the reporting of job 
status by underreporting the number of tasks that were node local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

Please review this patch for the fix plus a unit test case

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) where as node 
 addresses are a concatenation of hostname and command port (e.g. 
 host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this but does not affect the actual scheduling decisions by the 
 FifoScheduler because even though it incorrect determines that a request is 
 not local to the node, it will still schedule the request immediately because 
 it's rack-local.  However, this bug may be adversely affecting the reporting 
 of job status by underreporting the number of tasks that were node local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

Added a timeout on the unit test

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583522#comment-13583522
 ] 

Hadoop QA commented on YARN-412:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570348/YARN-412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/416//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/416//console

This message is automatically generated.

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

2013-02-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-413:
---

 Summary: With log aggregation on, nodemanager dies on startup if 
it can't connect to HDFS
 Key: YARN-413
 URL: https://issues.apache.org/jira/browse/YARN-413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza


If log aggregation is on, when the nodemanager starts up, it tries to create 
the remote log directory.  If this fails, it kills itself.  It doesn't seem 
like turning log aggregation on should ever cause the nodemanager to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-408) Capacity Scheduler delay scheduling should not be disabled by default

2013-02-21 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583625#comment-13583625
 ] 

Mayank Bansal commented on YARN-408:


Yeah sure.

There are two intents to change this value 

1. Make it enable by default
2. The algorithm which we use to check for the applications to get the 
scheduling opportunity is based on heart beat from the NM , hence if we just 
use the number of racks it will not be much of the value add for actually to 
achieve the node locality. The intent of using the number of delay is to 
actually at least wait for one heart beat from each node in the cluster and 
then move task to next locality level.

I defaulted it to number of nodes in one rack generally.

Thanks,
Mayank


 Capacity Scheduler delay scheduling should not be disabled by default
 -

 Key: YARN-408
 URL: https://issues.apache.org/jira/browse/YARN-408
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN-408-trunk.patch


 Capacity Scheduler delay scheduling should not be disabled by default.
 Enabling it to number of nodes in one rack.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

2013-02-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583660#comment-13583660
 ] 

Sandy Ryza commented on YARN-413:
-

2013-02-21 13:27:24,307 FATAL 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
org.apache.hadoop.yarn.YarnException: Failed to Start 
org.apache.hadoop.yarn.server.nodemanager.NodeManager
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
Caused by: org.apache.hadoop.yarn.YarnException: Failed to Start 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:248)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 3 more
Caused by: org.apache.hadoop.yarn.YarnException: Failed to create remoteLogDir 
[/tmp/logs]
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:207)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.start(LogAggregationService.java:132)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 5 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
 Cannot create directory /tmp/logs. Name node is in safe mode.
The reported blocks 7 has reached the threshold 0.9990 of total blocks 7. Safe 
mode will be turned off automatically in 25 seconds.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3067)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3045)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3024)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:667)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:468)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40995)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:482)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1778)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1774)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1772)

at org.apache.hadoop.ipc.Client.call(Client.java:1237)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:163)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:82)
at $Proxy9.mkdirs(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:450)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2115)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2086)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:540)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:204)
... 7 more
2013-02-21 13:27:24,308 INFO org.apache.hadoop.ipc.Server: Stopping server on 
47223
2013-02-21 13:27:24,308 INFO 

[jira] [Resolved] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

2013-02-21 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-413.
-

Resolution: Duplicate

 With log aggregation on, nodemanager dies on startup if it can't connect to 
 HDFS
 

 Key: YARN-413
 URL: https://issues.apache.org/jira/browse/YARN-413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza

 If log aggregation is on, when the nodemanager starts up, it tries to create 
 the remote log directory.  If this fails, it kills itself.  It doesn't seem 
 like turning log aggregation on should ever cause the nodemanager to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-02-21 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583713#comment-13583713
 ] 

Bikas Saha commented on YARN-392:
-

From what I understand this seems to be tangentially going down the path of 
the discussion that happened in YARN-371. The crucial point is that the YARN 
resource scheduler is *not* a task scheduler. So introducing concepts that 
directly or indirectly make it do task scheduling would be inconsistent with 
the design. Its a coarse grained resource allocator that gives the app 
containers that represent chunks of resources using which the app can schedule 
its tasks. Different versions of the scheduler change the way the resource 
sharing is being done. Fair/Capacity or otherwise. Ideally we should have only 
1 scheduler that has hooks to change the sharing policy. The code kind off 
reflects that because there is so much common code/logic between both 
implementations.

Unfortunately, in both the Fair and Capacity Scheduler the implementations have 
mixed up 1) decision to allocate at and below a given topology level [say * 
level] with 2) whether there are resource requests at that level. E.g. when 
allocation cycle is started for an app, the logic starts at the * and checks if 
the resource requests count  0. If yes then it goes into racks and then nodes. 
Which means that if an application wants resources only at a node then it has 
to create requests at the rack and * level too. This is because locality 
relaxation has gotten mixed up with being schedulable, if you catch my drift. 
My strong belief is that if we can fix this overload then we wont need to fix 
this jira. However I can see that fixing the overload will be a very 
complicated knot to untie and perhaps impossible to do now because it may be 
inextricably linked with the API. Which is why I created this jira.

Now, if the problem is the * overloaded that I describe above, then the problem 
is the entanglement of delay scheduling (for locality). Here is an alternative 
proposal that addresses this problem. Lets make the delay of the delay 
scheduling specifiable by the application. So an application can specify how 
long to wait before relaxing its node requests to rack and *. When an app wants 
containers on specific nodes it basically means that it does not want the RM to 
automatically relax its locality - thus specifying a large value for the delay. 
The end result being allocation on specific nodes if resources become available 
on those nodes. This also serves as a useful extension of delay scheduling. 
Short apps can be aggressive in relaxing locality while long+large jobs can be 
more conservative in trading of scheduling speed with network IO.
The catch in the proposal is that such requests have to be made at a different 
priority level. Resource requests at the same priority level get aggregated and 
we dont want to aggregate relaxable resource requests with non-relaxable 
resource requests. I think this is a good thing to do anyways because it makes 
the application think and decide which kind of tasks it needs to get running 
first.

An extension of this approach also ties in nicely with the API enhancement 
suggested by YARN-394. The RM could actually inform the app that it has not 
been able to allocate a resource request on a node and the time limit has 
elapsed. At which point, the app could cancel that request and ask for an 
alternative set of nodes. I agree I am hand-waving in this paragraph.

Thoughts?

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-414) [Umbrella] Usability issues in YARN

2013-02-21 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-414:


 Summary: [Umbrella] Usability issues in YARN
 Key: YARN-414
 URL: https://issues.apache.org/jira/browse/YARN-414
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker


Umbrella jira to track all forms of usability issues in YARN that need to be 
addressed before YARN can be considered stable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-404) Node Manager leaks Data Node connections

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-404:
-

Priority: Blocker  (was: Critical)

 Node Manager leaks Data Node connections
 

 Key: YARN-404
 URL: https://issues.apache.org/jira/browse/YARN-404
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha, 0.23.6
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Blocker

 RM is missing to give some applications to NM for clean up, due to this log 
 aggregation is not happening for those applications and also it is leaking 
 data node connections in NM side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-386) [Umbrella] YARN API cleanup

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-386:
-

Priority: Blocker  (was: Major)

 [Umbrella] YARN API cleanup
 ---

 Key: YARN-386
 URL: https://issues.apache.org/jira/browse/YARN-386
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker

 This is the umbrella ticket to capture any and every API cleanup that we wish 
 to do before YARN can be deemed beta/stable. Doing this API cleanup now and 
 ASAP will help us escape the pain of supporting bad APIs in beta/stable 
 releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-397) RM Scheduler api enhancements

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-397:
-

Priority: Blocker  (was: Major)

 RM Scheduler api enhancements
 -

 Key: YARN-397
 URL: https://issues.apache.org/jira/browse/YARN-397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Priority: Blocker

 Umbrella jira tracking enhancements to RM apis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-41:


Priority: Blocker  (was: Major)

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
Priority: Blocker
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-142) Change YARN APIs to throw IOException

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-142:
-

Priority: Blocker  (was: Critical)

 Change YARN APIs to throw IOException
 -

 Key: YARN-142
 URL: https://issues.apache.org/jira/browse/YARN-142
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-142.1.patch, YARN-142.2.patch, YARN-142.3.patch, 
 YARN-142.4.patch


 Ref: MAPREDUCE-4067
 All YARN APIs currently throw YarnRemoteException.
 1) This cannot be extended in it's current form.
 2) The RPC layer can throw IOExceptions. These end up showing up as 
 UndeclaredThrowableExceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-69) RM should throw different exceptions for while querying app/node/queue

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-69:


Priority: Blocker  (was: Major)

 RM should throw different exceptions for while querying app/node/queue
 --

 Key: YARN-69
 URL: https://issues.apache.org/jira/browse/YARN-69
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 We should distinguish the exceptions for absent app/node/queue, illegally 
 accessed app/node/queue etc. Today everything is a {{YarnRemoteException}}. 
 We should extend {{YarnRemoteException}} to add {{NotFoundException}}, 
 {{AccessControlException}} etc. Today, {{AccessControlException}} exists but 
 not as part of the protocol descriptions (i.e. only available to Java).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-85) Allow per job log aggregation configuration

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-85:


Priority: Critical  (was: Major)

 Allow per job log aggregation configuration
 ---

 Key: YARN-85
 URL: https://issues.apache.org/jira/browse/YARN-85
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical

 Currently, if log aggregation is enabled for a cluster - logs for all jobs 
 will be aggregated - leading to a whole bunch of files on hdfs which users 
 may not want.
 Users should be able to control this along with the aggregation policy - 
 failed only, all, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-117) Enhance YARN service model

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-117:
-

Priority: Blocker  (was: Major)

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Blocker

 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this an error an error or not? Log and ignore may not be what is desired.
 *Proposed:* during {{stop()}} any exception by a listener is caught and 
 discarded, to increase the likelihood of a better shutdown, but do not add 
 try-catch clauses to the other state changes.
 h2. Support static listeners for all AbstractServices
 Add support to {{AbstractService}} that allow callers to register listeners 
 for all instances. The existing listener interface could be used. This allows 
 

[jira] [Updated] (YARN-99) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-99:


Priority: Blocker  (was: Major)

 Jobs fail during resource localization when directories in file cache reaches 
 to unix directory limit
 -

 Key: YARN-99
 URL: https://issues.apache.org/jira/browse/YARN-99
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Blocker

 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache. The jobs start failing with the 
 below exception.
 {code:xml}
 java.io.IOException: mkdir of 
 /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 We should have a mechanism to clean the cache files if it crosses specified 
 number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-34) Split/Cleanup YARN and MAPREDUCE documentation

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-34?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-34:


Priority: Blocker  (was: Major)

 Split/Cleanup YARN and MAPREDUCE documentation
 --

 Key: YARN-34
 URL: https://issues.apache.org/jira/browse/YARN-34
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 Post YARN-1, we need to have clear separation between YARN and mapreduce. We 
 need to have separate sections on site and docs - we already have separate 
 documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-387) Fix inconsistent protocol naming

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-387:
-

Priority: Blocker  (was: Major)

 Fix inconsistent protocol naming
 

 Key: YARN-387
 URL: https://issues.apache.org/jira/browse/YARN-387
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: incompatible

 We now have different and inconsistent naming schemes for various protocols. 
 It was hard to explain to users, mainly in direct interactions at 
 talks/presentations and user group meetings, with such naming.
 We should fix these before we go beta. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-378:
-

Priority: Major  (was: Minor)

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
  Labels: usability

 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-71:


Issue Type: Bug  (was: Test)

 Ensure/confirm that the NodeManager cleanup their local filesystem when they 
 restart
 

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: YARN-71.1.patch, YARN-71.2.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-71:


Priority: Critical  (was: Major)

 Ensure/confirm that the NodeManager cleanup their local filesystem when they 
 restart
 

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1

2013-02-21 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-226:
-

Priority: Blocker  (was: Major)

 Log aggregation should not assume an AppMaster will have containerId 1
 --

 Key: YARN-226
 URL: https://issues.apache.org/jira/browse/YARN-226
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Priority: Blocker

 In case of reservcations, etc - AppMasters may not get container id 1. We 
 likely need additional info in the CLC / tokens indicating whether a 
 container is an AM or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-02-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583792#comment-13583792
 ] 

Sandy Ryza commented on YARN-392:
-

The proposal of per-app delay-scheduling parameters is one I hadn't thought of, 
and I think a good one for many use cases.  Do you mean that the delay 
threshold would be configurable per-app or per-priority?

The cases that I don't think it supports are:
* If the delay threshold is only configurable per app, an app needs some 
containers strictly on specific nodes, and for other containers only has loose 
preferences.
* An application wants two containers, the first on only node1 or node2 and the 
second on only node3 or node4.  What tells the scheduler not to assign both of 
the containers on node1 and node2?  These containers could be requested at 
different priorities, but that would essentially be using priorities to do 
task-centric scheduling.

Are these use cases non-goals for YARN?  Correct me if I'm wrong, but my 
understanding was that the primary reason that the resource scheduler is not a 
task scheduler is for performance reasons.  If we can allow it to be 
task-centric when necessary, but avoid the performance impact of making it 
task-centric all the time, it will support location-specific scheduling in the 
most flexible and intuitive way.

I hope this isn't rehashing the debate from YARN-371.  For anybody who will be 
the YARN meetup tomorrow, it would be great to chat about this for a couple 
minutes.

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM

2013-02-21 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583823#comment-13583823
 ] 

Hitesh Shah commented on YARN-196:
--

Patch still has trailing whitespace issues.

In YarnConfiguration.java:

+  /** Max time to wait for ResourceManger start
+   */
 
Please rephrase to max time to wait to establish a connection to the 
ResourceManager when the NodeManager starts

+  /** Time interval for each NM attempt to connect RM
+   */

Rephrase to Time interval between each NM attempt to connect to the 
ResourceManager 

You can use the same descriptions in yarn-default.xml. Not sure if After that 
period of time, NM will throw out exceptions is valid in yarn-default.xml. A 
better description could mention that the NM will shutdown if it cannot connect 
to the RM within the specified max time period. 
Description should also mention how to use -1 to retry forever.

Earlier comment had a point of switching to using SECONDS instead of MS for 
users to understand more easily.

+  // this.hostName = InetAddress.getLocalHost().getCanonicalHostName();
  - Please remove commented out code if not being used.

Unit test does not really seem to be testing the flow of 
RESOURCEMANAGER_CONNECT_WAIT_MS being set to -1. waitForEver is being 
explicitly set to true/false based on the updater's ctor and not really based 
on the config value. If that flow cannot be tested, it might be better to 
remove the additional complexity from the test.

Also, patch will need to be updated due to 
https://issues.apache.org/jira/browse/HADOOP-9112. 















 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 

[jira] [Updated] (YARN-47) Security issues in YARN

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-47:


Priority: Major  (was: Blocker)

  Security issues in YARN
 

 Key: YARN-47
 URL: https://issues.apache.org/jira/browse/YARN-47
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 JIRA tracking YARN related security issues.
 Moving over YARN only stuff from MAPREDUCE-3101.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-396) Rationalize AllocateResponse in RM scheduler API

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-396:
-

Priority: Major  (was: Blocker)

 Rationalize AllocateResponse in RM scheduler API
 

 Key: YARN-396
 URL: https://issues.apache.org/jira/browse/YARN-396
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Arun C Murthy

 AllocateResponse contains an AMResponse and cluster node count. AMResponse 
 that more data. Unless there is a good reason for this object structure, 
 there should be either AMResponse or AllocateResponse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-99:


Priority: Major  (was: Blocker)

 Jobs fail during resource localization when directories in file cache reaches 
 to unix directory limit
 -

 Key: YARN-99
 URL: https://issues.apache.org/jira/browse/YARN-99
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Devaraj K

 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache. The jobs start failing with the 
 below exception.
 {code:xml}
 java.io.IOException: mkdir of 
 /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 We should have a mechanism to clean the cache files if it crosses specified 
 number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-34) Split/Cleanup YARN and MAPREDUCE documentation

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-34?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-34:


Priority: Major  (was: Blocker)

 Split/Cleanup YARN and MAPREDUCE documentation
 --

 Key: YARN-34
 URL: https://issues.apache.org/jira/browse/YARN-34
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Post YARN-1, we need to have clear separation between YARN and mapreduce. We 
 need to have separate sections on site and docs - we already have separate 
 documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-41:


Priority: Major  (was: Blocker)

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-414) [Umbrella] Usability issues in YARN

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-414:
-

Priority: Major  (was: Blocker)

 [Umbrella] Usability issues in YARN
 ---

 Key: YARN-414
 URL: https://issues.apache.org/jira/browse/YARN-414
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 Umbrella jira to track all forms of usability issues in YARN that need to be 
 addressed before YARN can be considered stable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-69) RM should throw different exceptions for while querying app/node/queue

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-69:


Priority: Major  (was: Blocker)

 RM should throw different exceptions for while querying app/node/queue
 --

 Key: YARN-69
 URL: https://issues.apache.org/jira/browse/YARN-69
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 We should distinguish the exceptions for absent app/node/queue, illegally 
 accessed app/node/queue etc. Today everything is a {{YarnRemoteException}}. 
 We should extend {{YarnRemoteException}} to add {{NotFoundException}}, 
 {{AccessControlException}} etc. Today, {{AccessControlException}} exists but 
 not as part of the protocol descriptions (i.e. only available to Java).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-397) RM Scheduler api enhancements

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-397:
-

Priority: Major  (was: Blocker)

 RM Scheduler api enhancements
 -

 Key: YARN-397
 URL: https://issues.apache.org/jira/browse/YARN-397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy

 Umbrella jira tracking enhancements to RM apis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira