[jira] [Commented] (YARN-1901) All tasks restart during RM failover on Hive

2014-04-04 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959815#comment-13959815
 ] 

Fengdong Yu commented on YARN-1901:
---

Hi [~oazwa],
Can you search the mail list of yarn-dev, I had a mail for this issue.

This issue is only for Hive jobs. It works well for general MR jobs.(only 
unfinished tasks restart, all finished tasks not re-run)


 All tasks restart during RM failover on Hive
 

 Key: YARN-1901
 URL: https://issues.apache.org/jira/browse/YARN-1901
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I built from trunk, and configured RM Ha, then I submitted a hive job.
 there are total 11 maps, then I stopped active RM when 6 maps finished.
 but Hive shows me all map tasks restat again. This is conflict with the 
 design description.
 job progress:
 {code}
 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 713.84 sec
 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 722.83 sec
 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 731.95 sec
  2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 744.17 sec
 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 756.22 sec
 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 762.4 
 sec
  2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 774.64 sec
 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 786.49 sec
 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 792.59 sec
  2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 
 807.58 sec
 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 
 815.96 sec
 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 823.83 sec
  2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 826.84 sec
 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 832.16 sec
 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 
 839.73 sec
  2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 
 844.45 sec
 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 760.34 sec
 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
  2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 
 213.81 sec
 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 216.83 
 sec
 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 229.15 
 sec
  2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 
 244.42 sec
 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 
 247.31 sec
 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 259.5 
 sec
  2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 274.72 sec
 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 280.76 sec
 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 292.9 
 sec
  2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 305.16 sec
 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 314.21 sec
 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 323.34 sec
  2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 335.6 sec
 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 344.71 sec
 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 353.8 
 sec
  2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 366.06 sec
 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 375.2 
 sec
 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 384.28 sec
 {code}
 I am using hive-0.12.0,  and ZKRMStateRoot as RM store class.  Hive using a 
 simple external table(only one column).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-04-04 Thread Sietse T. Au (JIRA)
Sietse T. Au created YARN-1902:
--

 Summary: Allocation of too many containers when a second request 
is done with the same resource capability
 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.3.0, 2.2.0
Reporter: Sietse T. Au


Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called 
z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Scenario 2:
This behavior does not occur when no containers are started between the 
allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
are requested in both scenarios, but that only in the second scenario, the 
correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by 
the structure of the remoteRequestsTable. The consequence of MapResource, 
ResourceRequestInfo is that ResourceRequestInfo does not hold any information 
about whether a request has been sent to the RM yet or not.

There are workarounds for this, such as releasing the excess containers 
received.

The solution implemented is to initialize a new ResourceRequest in 
ResourceRequestInfo when a request has been successfully sent to the RM.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1901) All tasks restart during RM failover on Hive

2014-04-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960026#comment-13960026
 ] 

Jason Lowe commented on YARN-1901:
--

This appears to be a duplicate of HIVE-6638.  As [~ozawa] mentioned, AMs are 
restarted when the RM restarts until YARN-556 is addressed.  When an AM 
restarts, it is not automatically the case that completed tasks will be 
recovered -- it must be supported by the output committer.  HIVE-6638 is 
updating Hive's OutputCommitter so it can support task recovery upon AM restart.

 All tasks restart during RM failover on Hive
 

 Key: YARN-1901
 URL: https://issues.apache.org/jira/browse/YARN-1901
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I built from trunk, and configured RM Ha, then I submitted a hive job.
 there are total 11 maps, then I stopped active RM when 6 maps finished.
 but Hive shows me all map tasks restat again. This is conflict with the 
 design description.
 job progress:
 {code}
 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 713.84 sec
 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 722.83 sec
 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 731.95 sec
  2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 744.17 sec
 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 756.22 sec
 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 762.4 
 sec
  2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 774.64 sec
 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 786.49 sec
 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 792.59 sec
  2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 
 807.58 sec
 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 
 815.96 sec
 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 823.83 sec
  2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 826.84 sec
 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 832.16 sec
 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 
 839.73 sec
  2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 
 844.45 sec
 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 760.34 sec
 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
  2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 
 213.81 sec
 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 216.83 
 sec
 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 229.15 
 sec
  2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 
 244.42 sec
 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 
 247.31 sec
 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 259.5 
 sec
  2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 274.72 sec
 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 280.76 sec
 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 292.9 
 sec
  2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 305.16 sec
 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 314.21 sec
 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 323.34 sec
  2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 335.6 sec
 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 344.71 sec
 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 353.8 
 sec
  2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 366.06 sec
 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 375.2 
 sec
 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 384.28 sec
 {code}
 I am using hive-0.12.0,  and ZKRMStateRoot as RM store class.  Hive using a 
 simple external table(only one column).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-04-04 Thread Sietse T. Au (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sietse T. Au updated YARN-1902:
---

Attachment: YARN-1902.patch

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0
Reporter: Sietse T. Au
  Labels: patch
 Attachments: YARN-1902.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 This behavior does not occur when no containers are started between the 
 allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-04-04 Thread Sietse T. Au (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sietse T. Au updated YARN-1902:
---

Description: 
Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called 
z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Scenario 2:
No containers are started between the allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
are requested in both scenarios, but that only in the second scenario, the 
correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by 
the structure of the remoteRequestsTable. The consequence of MapResource, 
ResourceRequestInfo is that ResourceRequestInfo does not hold any information 
about whether a request has been sent to the RM yet or not.

There are workarounds for this, such as releasing the excess containers 
received.

The solution implemented is to initialize a new ResourceRequest in 
ResourceRequestInfo when a request has been successfully sent to the RM.



  was:
Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called 
z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Scenario 2:
This behavior does not occur when no containers are started between the 
allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
are requested in both scenarios, but that only in the second scenario, the 
correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by 
the structure of the remoteRequestsTable. The consequence of MapResource, 
ResourceRequestInfo is that ResourceRequestInfo does not hold any information 
about whether a request has been sent to the RM yet or not.

There are workarounds for this, such as releasing the excess containers 
received.

The solution implemented is to initialize a new ResourceRequest in 
ResourceRequestInfo when a request has been successfully sent to the RM.




 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0
Reporter: Sietse T. Au
  Labels: patch
 Attachments: YARN-1902.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-04-04 Thread Sietse T. Au (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sietse T. Au updated YARN-1902:
---

Description: 
Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called 
z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Scenario 2:
No containers are started between the allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
are requested in both scenarios, but that only in the second scenario, the 
correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by 
the structure of the remoteRequestsTable. The consequence of MapResource, 
ResourceRequestInfo is that ResourceRequestInfo does not hold any information 
about whether a request has been sent to the RM yet or not.

There are workarounds for this, such as releasing the excess containers 
received.

The solution implemented is to initialize a new ResourceRequest in 
ResourceRequestInfo when a request has been successfully sent to the RM.

The patch includes a test in which scenario one is tested.

  was:
Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called 
z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Scenario 2:
No containers are started between the allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
are requested in both scenarios, but that only in the second scenario, the 
correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by 
the structure of the remoteRequestsTable. The consequence of MapResource, 
ResourceRequestInfo is that ResourceRequestInfo does not hold any information 
about whether a request has been sent to the RM yet or not.

There are workarounds for this, such as releasing the excess containers 
received.

The solution implemented is to initialize a new ResourceRequest in 
ResourceRequestInfo when a request has been successfully sent to the RM.




 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0
Reporter: Sietse T. Au
  Labels: patch
 Attachments: YARN-1902.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails

2014-04-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960255#comment-13960255
 ] 

Jian He commented on YARN-1837:
---

looks good to me, +1

 TestMoveApplication.testMoveRejectedByScheduler randomly fails
 --

 Key: YARN-1837
 URL: https://issues.apache.org/jira/browse/YARN-1837
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Tsuyoshi OZAWA
Assignee: Hong Zhiguo
 Attachments: YARN-1837.patch


 TestMoveApplication#testMoveRejectedByScheduler fails because of 
 NullPointerException. It looks caused by unhandled exception handling at 
 server-side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails

2014-04-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960259#comment-13960259
 ] 

Jian He commented on YARN-1837:
---

One more observation is that move is allowed at submitted state? not sure 
that's expected or not.  Unrelevant  to this patch.

Checking this in.



 TestMoveApplication.testMoveRejectedByScheduler randomly fails
 --

 Key: YARN-1837
 URL: https://issues.apache.org/jira/browse/YARN-1837
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Tsuyoshi OZAWA
Assignee: Hong Zhiguo
 Attachments: YARN-1837.patch


 TestMoveApplication#testMoveRejectedByScheduler fails because of 
 NullPointerException. It looks caused by unhandled exception handling at 
 server-side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails

2014-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960281#comment-13960281
 ] 

Hudson commented on YARN-1837:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5458 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5458/])
YARN-1837. Fixed TestMoveApplication#testMoveRejectedByScheduler failure. 
Contributed by Hong Zhiguo (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1584862)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java


 TestMoveApplication.testMoveRejectedByScheduler randomly fails
 --

 Key: YARN-1837
 URL: https://issues.apache.org/jira/browse/YARN-1837
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Tsuyoshi OZAWA
Assignee: Hong Zhiguo
 Fix For: 2.4.1

 Attachments: YARN-1837.patch


 TestMoveApplication#testMoveRejectedByScheduler fails because of 
 NullPointerException. It looks caused by unhandled exception handling at 
 server-side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1872:
--

Priority: Blocker  (was: Major)
Target Version/s: 2.4.1
  Labels:   (was: patch)

 TestDistributedShell occasionally fails in trunk
 

 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Blocker
 Attachments: TestDistributedShell.out, YARN-1872.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
 TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
 TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-04-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960357#comment-13960357
 ] 

Zhijie Shen commented on YARN-1872:
---

bq. After the DistributedShell AM requested numTotalContainers containers, RM 
main allocate more than that.

[~zhiguohong], thanks for working on the test failure. Do you know why RM is 
likely to allocate more containers than AM requested? Is it related to what 
YARN-1902 described?

 TestDistributedShell occasionally fails in trunk
 

 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Blocker
 Attachments: TestDistributedShell.out, YARN-1872.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
 TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
 TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1903) TestNMClient fails occasionally

2014-04-04 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1903:
-

 Summary: TestNMClient fails occasionally
 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


The container status after stopping container is not expected.
{code}
java.lang.AssertionError: 4: 
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1903) TestNMClient fails occasionally

2014-04-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960418#comment-13960418
 ] 

Zhijie Shen commented on YARN-1903:
---

I found the following log:
{code}
2014-04-04 05:08:01,361 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:getContainerStatusInternal(785)) - Returning 
ContainerStatus: [ContainerId: container_1396613275302_0001_01_04, State: 
RUNNING, Diagnostics: , ExitStatus: -1000, ]
2014-04-04 05:08:01,365 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:stopContainerInternal(718)) - Stopping container 
with container Id: container_1396613275302_0001_01_04
2014-04-04 05:08:01,366 INFO  nodemanager.NMAuditLogger 
(NMAuditLogger.java:logSuccess(89)) - USER=jenkins  IP=10.79.62.28  
OPERATION=Stop Container RequestTARGET=ContainerManageImpl  
RESULT=SUCCESS  APPID=application_1396613275302_0001
CONTAINERID=container_1396613275302_0001_01_04
2014-04-04 05:08:01,387 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(169)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread
2014-04-04 05:08:01,387 INFO  containermanager.AuxServices 
(AuxServices.java:handle(175)) - Got event CONTAINER_STOP for appId 
application_1396613275302_0001
2014-04-04 05:08:01,389 INFO  application.Application 
(ApplicationImpl.java:transition(296)) - Adding 
container_1396613275302_0001_01_04 to application 
application_1396613275302_0001
2014-04-04 05:08:01,389 INFO  nodemanager.NMAuditLogger 
(NMAuditLogger.java:logSuccess(89)) - USER=jenkins  OPERATION=Container 
Finished - Killed   TARGET=ContainerImplRESULT=SUCCESS  
APPID=application_1396613275302_0001
CONTAINERID=container_1396613275302_0001_01_04
2014-04-04 05:08:01,389 INFO  container.Container 
(ContainerImpl.java:handle(884)) - Container 
container_1396613275302_0001_01_04 transitioned from NEW to DONE
2014-04-04 05:08:01,389 INFO  application.Application 
(ApplicationImpl.java:transition(339)) - Removing 
container_1396613275302_0001_01_04 from application 
application_1396613275302_0001
2014-04-04 05:08:01,390 INFO  util.ProcfsBasedProcessTree 
(ProcfsBasedProcessTree.java:isAvailable(182)) - ProcfsBasedProcessTree 
currently is supported only on Linux.
2014-04-04 05:08:01,392 INFO  rmcontainer.RMContainerImpl 
(RMContainerImpl.java:handle(321)) - container_1396613275302_0001_01_04 
Container Transitioned from ACQUIRED to RUNNING
2014-04-04 05:08:01,393 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:getContainerStatusInternal(771)) - Getting 
container-status for container_1396613275302_0001_01_04
2014-04-04 05:08:01,393 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:getContainerStatusInternal(785)) - Returning 
ContainerStatus: [ContainerId: container_1396613275302_0001_01_04, State: 
COMPLETE, Diagnostics: , ExitStatus: -1000, ]
{code}

When the kill event is received, the container is still at NEW, it is moved to 
DONE by going through ContainerDoneTransition, which won't set the killing 
related exitcode and diagnostics.

 TestNMClient fails occasionally
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1903) TestNMClient fails occasionally

2014-04-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960480#comment-13960480
 ] 

Zhijie Shen commented on YARN-1903:
---

I did more investigation. Instead of a test failure, it sound more like a bug 
on container life cycle to me:

1. If a container is killed on NEW, the exit code and diagnostics will never be 
set.
2. If a container is killed on LOCALIZING, the exit code will never be set.

 TestNMClient fails occasionally
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1903:
--

Summary: Killing Container on NEW and LOCALIZING will result in exitCode 
and diagnostics not set  (was: TestNMClient fails occasionally)

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1901) All tasks restart during RM failover on Hive

2014-04-04 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960515#comment-13960515
 ] 

Fengdong Yu commented on YARN-1901:
---

Yes, exactly duplicated, thanks, I've closed it.

 All tasks restart during RM failover on Hive
 

 Key: YARN-1901
 URL: https://issues.apache.org/jira/browse/YARN-1901
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I built from trunk, and configured RM Ha, then I submitted a hive job.
 there are total 11 maps, then I stopped active RM when 6 maps finished.
 but Hive shows me all map tasks restat again. This is conflict with the 
 design description.
 job progress:
 {code}
 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 713.84 sec
 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 722.83 sec
 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 731.95 sec
  2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 744.17 sec
 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 756.22 sec
 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 762.4 
 sec
  2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 774.64 sec
 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 786.49 sec
 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 792.59 sec
  2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 
 807.58 sec
 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 
 815.96 sec
 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 823.83 sec
  2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 826.84 sec
 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 832.16 sec
 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 
 839.73 sec
  2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 
 844.45 sec
 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 760.34 sec
 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
  2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 
 213.81 sec
 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 216.83 
 sec
 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 229.15 
 sec
  2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 
 244.42 sec
 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 
 247.31 sec
 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 259.5 
 sec
  2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 274.72 sec
 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 280.76 sec
 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 292.9 
 sec
  2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 305.16 sec
 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 314.21 sec
 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 323.34 sec
  2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 335.6 sec
 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 344.71 sec
 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 353.8 
 sec
  2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 366.06 sec
 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 375.2 
 sec
 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 384.28 sec
 {code}
 I am using hive-0.12.0,  and ZKRMStateRoot as RM store class.  Hive using a 
 simple external table(only one column).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1903:
--

Attachment: YARN-1903.1.patch

Upload a patch to fix these issues

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1901) All tasks restart during RM failover on Hive

2014-04-04 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved YARN-1901.
---

Resolution: Duplicate

 All tasks restart during RM failover on Hive
 

 Key: YARN-1901
 URL: https://issues.apache.org/jira/browse/YARN-1901
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Fengdong Yu

 I built from trunk, and configured RM Ha, then I submitted a hive job.
 there are total 11 maps, then I stopped active RM when 6 maps finished.
 but Hive shows me all map tasks restat again. This is conflict with the 
 design description.
 job progress:
 {code}
 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 713.84 sec
 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 722.83 sec
 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 731.95 sec
  2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 744.17 sec
 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 756.22 sec
 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 762.4 
 sec
  2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 
 774.64 sec
 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 786.49 sec
 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 
 792.59 sec
  2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 
 807.58 sec
 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 
 815.96 sec
 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 823.83 sec
  2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 
 826.84 sec
 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 832.16 sec
 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 
 839.73 sec
  2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 
 844.45 sec
 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 
 760.34 sec
 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
  2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 
 213.81 sec
 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 216.83 
 sec
 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 229.15 
 sec
  2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 
 244.42 sec
 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 
 247.31 sec
 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 259.5 
 sec
  2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 274.72 sec
 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 280.76 sec
 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 292.9 
 sec
  2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 305.16 sec
 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 314.21 sec
 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 323.34 sec
  2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 335.6 sec
 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 344.71 sec
 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 353.8 
 sec
  2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 366.06 sec
 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 375.2 
 sec
 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 
 384.28 sec
 {code}
 I am using hive-0.12.0,  and ZKRMStateRoot as RM store class.  Hive using a 
 simple external table(only one column).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM

2014-04-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1898:


Attachment: YARN-1898.addendum.patch

submit the same patch again to kill off the Jenkins

 Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are 
 redirecting to Active RM
 -

 Key: YARN-1898
 URL: https://issues.apache.org/jira/browse/YARN-1898
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Yesha Vora
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, 
 YARN-1898.addendum.patch, YARN-1898.addendum.patch


 Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to 
 Active RM.
 It should not be redirected to Active RM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-04-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960541#comment-13960541
 ] 

Jason Lowe commented on YARN-1769:
--

The patch no longer applies cleanly after YARN-1512.  Other comments on the 
patch:

- Nit: In LeafQueue.assignToQueue we could cache Resource.add(usedResources, 
required) in a local when we're computing potentialNewCapacity so we don't have 
to recompute it as part of the potentialNewWithoutReservedCapacity computation
- LeafQueue.assignToQueue and LeafQueue.assignToUser don't seem to need the new 
priority argument, and therefore LeafQueue.checkLimitsToReserve wouldn't seem 
to need it either once those others are updated.
- Should FiCaSchedulerApp getAppToUnreserve really be called 
getNodeIdToUnreserve or geNodeToUnreserve, since it's returning a node ID 
rather than an app?
- In LeafQueue.findNodeToUnreserve, isn't it kinda bad if the app thinks it has 
reservations on the node but the scheduler doesn't know about it?  Wondering if 
the bookkeeping is messed up at that point therefore something a bit more than 
debug is an appropriate log level and if further fixup is needed.
- LeafQueue.findNodeToUnreserve is adjusting the headroom when it unreserves, 
but I don't see other unreservations doing a similar calculation.  Wondering if 
this fixup is something that should have been in completedContainer or needs to 
be done elsewhere?  I could easily be missing something here but asking just in 
case other unreservation situations also need to have the headroom fixed.
- LeafQueue.assignContainer uses the much more expensive 
scheduler.getConfiguration().getReservationContinueLook() when it should be 
able to use the reservationsContinueLooking member instead.
- LeafQueue.getReservationContinueLooking should be package private
- Nit: LeafQueue.assignContainer has some reformatting of the log message after 
the // Inform the node comment which was clearer to read/maintain before 
since the label and the value were always on a line by themselves.  Same goes 
for the Reserved container log towards the end of the method.
- Ultra-Nit: ParentQueue.setupQueueConfig's log message should have the 
reservationsContinueLooking on the previous line to match the style of other 
label/value pairs in the log message.
- ParentQueue.getReservationContinueLooking should be package private.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set

2014-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960776#comment-13960776
 ] 

Hadoop QA commented on YARN-1903:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638788/YARN-1903.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3513//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3513//console

This message is automatically generated.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-04-04 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960797#comment-13960797
 ] 

Hong Zhiguo commented on YARN-1872:
---

Yes. It is. And MapRedue V2 AM contains some code to work around for this 
strange behavior.
I'll review YARN-1902 patch later.
But anyway, it's better to move the check to inside the loop (What's done in 
this patch).


 TestDistributedShell occasionally fails in trunk
 

 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Blocker
 Attachments: TestDistributedShell.out, YARN-1872.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
 TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
 TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set

2014-04-04 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960809#comment-13960809
 ] 

Xuan Gong commented on YARN-1903:
-

+1 LGTM

Also, I run the TestNMClient with this patch applied on Windows several times. 
All of them are passed.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM

2014-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960826#comment-13960826
 ] 

Hadoop QA commented on YARN-1898:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12638792/YARN-1898.addendum.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3514//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3514//console

This message is automatically generated.

 Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are 
 redirecting to Active RM
 -

 Key: YARN-1898
 URL: https://issues.apache.org/jira/browse/YARN-1898
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Yesha Vora
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, 
 YARN-1898.addendum.patch, YARN-1898.addendum.patch


 Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to 
 Active RM.
 It should not be redirected to Active RM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active

2014-04-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960929#comment-13960929
 ] 

Arun C Murthy commented on YARN-1878:
-

[~xgong] is this ready to go? Let's get this into 2.4.1. Tx.

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active

2014-04-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1878:


Target Version/s: 2.4.1

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active

2014-04-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1878:


Priority: Blocker  (was: Major)

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM

2014-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960941#comment-13960941
 ] 

Hudson commented on YARN-1898:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5460 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5460/])
YARN-1898. Addendum patch to ensure /jmx and /metrics are re-directed to Active 
RM. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1584954)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are 
 redirecting to Active RM
 -

 Key: YARN-1898
 URL: https://issues.apache.org/jira/browse/YARN-1898
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Yesha Vora
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, 
 YARN-1898.addendum.patch, YARN-1898.addendum.patch


 Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to 
 Active RM.
 It should not be redirected to Active RM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) More intuitive defaults for AHS

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1701:
--

Affects Version/s: (was: 2.4.0)
   2.4.1

 More intuitive defaults for AHS
 ---

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1701.v01.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1701) More intuitive defaults for AHS

2014-04-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960967#comment-13960967
 ] 

Zhijie Shen commented on YARN-1701:
---

[~jira.shegalov], would you mind updating the patch? It no longer applies. And 
can we have one shot fix for both the timeline store and the generic history 
store path? Thanks!

 More intuitive defaults for AHS
 ---

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1701.v01.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1701:
--

Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-321)

 Improve default paths of timeline store and generic history store
 -

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1701.v01.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1701:
--

Summary: Improve default paths of timeline store and generic history store  
(was: More intuitive defaults for AHS)

 Improve default paths of timeline store and generic history store
 -

 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1701.v01.patch


 When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
 in AHS webUI. This is due to NullApplicationHistoryStore as 
 yarn.resourcemanager.history-writer.class. It would be good to have just one 
 key to enable basic functionality.
 yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
 local file system location. However, FileSystemApplicationHistoryStore uses 
 DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService

2014-04-04 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1904:
-

 Summary: Uniform the NotFound messages from ClientRMService 
and ApplicationHistoryClientService
 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


It's good to make ClientRMService and ApplicationHistoryClientService throw 
NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1904:
--

Attachment: YARN-1904.1.patch

Create a patch, simple message editing, without new test cases

 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService

2014-04-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1904:
--

Target Version/s: 2.4.1

 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService

2014-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960988#comment-13960988
 ] 

Hadoop QA commented on YARN-1904:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638841/YARN-1904.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3515//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3515//console

This message is automatically generated.

 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.2#6252)