[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919153#comment-13919153
 ] 

Sunil G commented on YARN-1769:
---

Hi Thomas

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919154#comment-13919154
 ] 

Sunil G commented on YARN-1769:
---

Hi Thomas

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1768) yarn kill non-existent application is too verbose

2014-03-04 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919157#comment-13919157
 ] 

Ravi Prakash commented on YARN-1768:


Thanks Tsuyoshi!

Patch lgtm. +1. I'll commit it tomorrow in trunk and branch-2 unless any one 
has a comment.

 yarn kill non-existent application is too verbose
 -

 Key: YARN-1768
 URL: https://issues.apache.org/jira/browse/YARN-1768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1768.1.patch, YARN-1768.2.patch, YARN-1768.3.patch


 Instead of catching ApplicationNotFound and logging a simple app not found 
 message, the whole stack trace is logged.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919161#comment-13919161
 ] 

Sunil G commented on YARN-1769:
---

Hi Thomas

In LeafQueue assignContainer call, reserve() call will happen from the else 
logic.
Here there is a check as below

if ((!scheduler.getConfiguration().getReservationContinueLook())
  || (canAllocContainer) || (rmContainer != null)) {

Is there any scenrio to happen with out these 3 case. ?
Only if for first time allocation, and if application cant assign a container, 
may be chances are less to reach here.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs

2014-03-04 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919170#comment-13919170
 ] 

Mayank Bansal commented on YARN-1389:
-

Thanks [~zjshen] for the review

bq. 1. ApplicationClientProtocol and ApplicationHistoryProtocol are able to 
share a base interface now?
I think we decided we will keep the interfaces seprate.

bq. 2. Javadoc in ApplicationHistoryProtocol says the data is obtained from 
AHS, which is not correct.
Done

bq. 3. YarnClientImpl misses the implementation for getting 
attempts/container/containers
Done

bq. 4. Users are not able to get completed application list via YarnClient
Done

bq. 5. Like RMApp, make createApplicationAttemptReport/ContainerReport as part 
of RMAppAttempt/RMContainer.
These are just utility functions, do you think they are needed in RMAPPATtempt 
and RMContainer?

Updating the latest patch.

Thanks,
Mayank


 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs

2014-03-04 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1389:


Attachment: YARN-1389-3.patch

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919254#comment-13919254
 ] 

Hudson commented on YARN-1748:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #499 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/499/])
YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar 
and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml


 hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
 -

 Key: YARN-1748
 URL: https://issues.apache.org/jira/browse/YARN-1748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sravya Tirukkovalur
Assignee: Sravya Tirukkovalur
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1748-1.patch, YARN-1748-1.patch


 Jars should not package config files, as this might come into the classpaths 
 of clients causing the clients to break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919245#comment-13919245
 ] 

Hudson commented on YARN-1765:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #499 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/499/])
YARN-1765. Added test cases to verify that killApplication API works across 
ResourceManager failover. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Write test cases to verify that killApplication API works in RM HA
 --

 Key: YARN-1765
 URL: https://issues.apache.org/jira/browse/YARN-1765
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, 
 YARN-1765.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919250#comment-13919250
 ] 

Hudson commented on YARN-1729:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #499 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/499/])
YARN-1729. Made TimelineWebServices deserialize the string primary- and 
secondary-filters param into the JSON-compatible object. Contributed by Billie 
Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1206) Container logs link is broken on RM web UI after application finished

2014-03-04 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-1206:


Assignee: Rohith

 Container logs link is broken on RM web UI after application finished
 -

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
Priority: Blocker

 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1206) Container logs link is broken on RM web UI after application finished

2014-03-04 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1206:
-

Attachment: YARN-1206.patch

Attaching patch for fixing this issue. Please review

 Container logs link is broken on RM web UI after application finished
 -

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1206.patch


 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919280#comment-13919280
 ] 

Hudson commented on YARN-1765:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/])
YARN-1765. Added test cases to verify that killApplication API works across 
ResourceManager failover. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Write test cases to verify that killApplication API works in RM HA
 --

 Key: YARN-1765
 URL: https://issues.apache.org/jira/browse/YARN-1765
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, 
 YARN-1765.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919289#comment-13919289
 ] 

Hudson commented on YARN-1748:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/])
YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar 
and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml


 hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
 -

 Key: YARN-1748
 URL: https://issues.apache.org/jira/browse/YARN-1748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sravya Tirukkovalur
Assignee: Sravya Tirukkovalur
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1748-1.patch, YARN-1748-1.patch


 Jars should not package config files, as this might come into the classpaths 
 of clients causing the clients to break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1675) Application does not change to RUNNING after being scheduled

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919281#comment-13919281
 ] 

Hudson commented on YARN-1675:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/])
YARN-1675. Added the previously missed new file. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573736)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestKillApplicationWithRMHA.java


 Application does not change to RUNNING after being scheduled
 

 Key: YARN-1675
 URL: https://issues.apache.org/jira/browse/YARN-1675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Trupti Dhavle

 I dont see any stacktraces in logs. But the debug logs show negative vcores-
 {noformat}
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(808)) - assignContainers: 
 node=hor11n39.gq1.ygridcore.net #applications=5
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0272
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0272
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0273
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0273 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application 

[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919285#comment-13919285
 ] 

Hudson commented on YARN-1729:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/])
YARN-1729. Made TimelineWebServices deserialize the string primary- and 
secondary-filters param into the JSON-compatible object. Contributed by Billie 
Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919288#comment-13919288
 ] 

Hudson commented on YARN-1758:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/])
YARN-1758. Fixed ResourceManager to not mandate the presence of site specific 
configuration files and thus fix failures in downstream tests. Contributed by 
Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573695)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/FileSystemBasedConfigurationProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 MiniYARNCluster broken post YARN-1666
 -

 Key: YARN-1758
 URL: https://issues.apache.org/jira/browse/YARN-1758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1758.1.patch, YARN-1758.2.patch


 NPE seen when trying to use MiniYARNCluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1704) Review LICENSE and NOTICE to reflect new levelDB releated libraries being used

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919284#comment-13919284
 ] 

Hudson commented on YARN-1704:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/])
YARN-1704. Modified LICENSE and NOTICE files to reflect newly used levelDB 
related libraries. Contributed by Billie Rinaldi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573702)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/LICENSE.txt
* /hadoop/common/trunk/hadoop-yarn-project/NOTICE.txt


 Review LICENSE and NOTICE to reflect new levelDB releated libraries being used
 --

 Key: YARN-1704
 URL: https://issues.apache.org/jira/browse/YARN-1704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1704.1.patch, YARN-1704.2.patch, YARN-1704.3.patch


 Make any changes necessary in LICENSE and NOTICE related to dependencies 
 introduced by the application timeline store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-257) NM should gracefully handle a full local disk

2014-03-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919295#comment-13919295
 ] 

Sunil G commented on YARN-257:
--

May be NM can do some level of handling by itself in Disk Full scenario as in 
first place.
NM's LocalDirAllocator gives a local path to write from the good list of 
directories.
But for this, it uses a round robin algorithm based on space available.

In a scenario like below, if more tasks asks for path from the set of local 
directories, 
then it is possible that the allocation is done based on the current 
availability at that given time.
But this path would have earlier given to some other tasks to write and they 
may be sequentially doing writing.

Basically the allotted space is not considered when next allocation is given 
for another task from same path. 
[Assuming few earlier allocated tasks is doing write at this time]

But it is not possible to consider this earlier allotted space and it is not 
possible to predict the disk write speed.

Could it be possible to predict disk full scenario rather than acting on when 
it happens.
For Eg, current health check mechanism will check access permission etc to 
identify and good and bad directories for 2 minute interval.
Here if the space is almost full (say 95% or only 5*100Mb is remaining), then 
it is better to move that directory to bad list directories.

Or in the LocalDirAllocator, it is better to check for high percentage of disk 
used. And do not assign such a directory to that task.
These measures might possible help to resolve the new tasks not to fail because 
of an immediate disk full scenario.

 NM should gracefully handle a full local disk
 -

 Key: YARN-257
 URL: https://issues.apache.org/jira/browse/YARN-257
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe

 When a local disk becomes full, the node will fail every container launched 
 on it because the container is unable to localize.  It tries to create an 
 app-specific directory for each local and log directories.  If any of those 
 directory creates fail (due to lack of free space) the container fails.
 It would be nice if the node could continue to launch containers using the 
 space available on other disks rather than failing all containers trying to 
 launch on the node.
 This is somewhat related to YARN-91 but is centered around the disk becoming 
 full rather than the disk failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919301#comment-13919301
 ] 

Hadoop QA commented on YARN-1389:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632479/YARN-1389-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3239//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3239//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3239//console

This message is automatically generated.

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1206) Container logs link is broken on RM web UI after application finished

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919317#comment-13919317
 ] 

Hadoop QA commented on YARN-1206:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632490/YARN-1206.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3240//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3240//console

This message is automatically generated.

 Container logs link is broken on RM web UI after application finished
 -

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1206.patch


 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState

2014-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919440#comment-13919440
 ] 

Jason Lowe commented on YARN-1445:
--

bq. Then, it is possible that AM is unregistered, and RM tells the client that 
the application is still running. When the client moves on to contact AM, AM 
has proceeded and exited before being able to respond the client request.

This race will always exist and is inherent with asynchronous processes.  The 
client could check the RM and the app could really be RUNNING, but by the time 
the client gets around to contacting the app the AM has rushed through the 
FINISHING and FINISHED state and could be gone by the time the client gets 
there.  That's why ClientServiceDelegate retries on errors and re-evaluates 
whether to go to the AM or history server on each retry.



 Separate FINISHING and FINISHED state in YarnApplicationState
 -

 Key: YARN-1445
 URL: https://issues.apache.org/jira/browse/YARN-1445
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch, 
 YARN-1445.4.patch, YARN-1445.5.patch, YARN-1445.5.patch, YARN-1445.6.patch


 Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to 
 YarnApplicationState.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1730) Leveldb timeline store needs simple write locking

2014-03-04 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1730:
-

Description: Although the leveldb writes are performed atomically in a 
batch, a start time for the entity needs to identified before each write.  Thus 
a per-entity write lock should be acquired.  (was: The actual data writes are 
performed atomically in a batch, but a lock should be held while identifying a 
start time for the entity, which precedes every write.)

 Leveldb timeline store needs simple write locking
 -

 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, 
 YARN-1730.4.patch, YARN-1730.5.patch, YARN-1730.6.patch


 Although the leveldb writes are performed atomically in a batch, a start time 
 for the entity needs to identified before each write.  Thus a per-entity 
 write lock should be acquired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919470#comment-13919470
 ] 

Hudson commented on YARN-1765:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/])
YARN-1765. Added test cases to verify that killApplication API works across 
ResourceManager failover. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Write test cases to verify that killApplication API works in RM HA
 --

 Key: YARN-1765
 URL: https://issues.apache.org/jira/browse/YARN-1765
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, 
 YARN-1765.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1675) Application does not change to RUNNING after being scheduled

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919471#comment-13919471
 ] 

Hudson commented on YARN-1675:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/])
YARN-1675. Added the previously missed new file. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573736)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestKillApplicationWithRMHA.java


 Application does not change to RUNNING after being scheduled
 

 Key: YARN-1675
 URL: https://issues.apache.org/jira/browse/YARN-1675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Trupti Dhavle

 I dont see any stacktraces in logs. But the debug logs show negative vcores-
 {noformat}
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(808)) - assignContainers: 
 node=hor11n39.gq1.ygridcore.net #applications=5
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0272
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0272
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0273
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0273 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application 

[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919479#comment-13919479
 ] 

Hudson commented on YARN-1748:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/])
YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar 
and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml


 hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
 -

 Key: YARN-1748
 URL: https://issues.apache.org/jira/browse/YARN-1748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sravya Tirukkovalur
Assignee: Sravya Tirukkovalur
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1748-1.patch, YARN-1748-1.patch


 Jars should not package config files, as this might come into the classpaths 
 of clients causing the clients to break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919475#comment-13919475
 ] 

Hudson commented on YARN-1729:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/])
YARN-1729. Made TimelineWebServices deserialize the string primary- and 
secondary-filters param into the JSON-compatible object. Contributed by Billie 
Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-04 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919506#comment-13919506
 ] 

Thomas Graves commented on YARN-1769:
-


if canAllocContainer is false then you can't reserve another container.  This 
could happen if you don't have any containers to unreserve when you hit the 
reservation limits and this node doesn't have available containers.

  if ((!scheduler.getConfiguration().getReservationContinueLook()) // 
without feature always reserve like previously did
  || (canAllocContainer) // if we hit our reservation limit and no 
available space on this node, don't reserve another one 
  || (rmContainer != null)) { // if this was called because node 
already had reservation, we need to make sure it gets book keeped as 
re-reservation 

 I can simplify this a bit.  I don't really need the 
!scheduler.getConfiguration().getReservationContinueLook check anymore since 
canAllocContainer defaults to true in that case. 


 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-b23.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670.patch

Attaching the patch for trunk, branch2  and branch23.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919544#comment-13919544
 ] 

Hadoop QA commented on YARN-1670:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632518/YARN-1670.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3241//console

This message is automatically generated.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-04 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

update if check in assignContainer

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919548#comment-13919548
 ] 

Rohith commented on YARN-1752:
--

Previous Hadoop QA failure is not because of patch. What is the procedure to 
rerun the HadoopQA?

 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670.patch

updated patch for trunk and branch-2

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919580#comment-13919580
 ] 

Hadoop QA commented on YARN-1670:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632525/YARN-1670.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3243//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3243//console

This message is automatically generated.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919592#comment-13919592
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632523/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3242//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3242//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3242//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919661#comment-13919661
 ] 

Vinod Kumar Vavilapalli commented on YARN-1730:
---

I wish there were more tests, but testing these write locks isn't easy. So I am 
fine for now.

The latest patch looks good to me. +1. Checking this in.

 Leveldb timeline store needs simple write locking
 -

 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, 
 YARN-1730.4.patch, YARN-1730.5.patch, YARN-1730.6.patch


 Although the leveldb writes are performed atomically in a batch, a start time 
 for the entity needs to identified before each write.  Thus a per-entity 
 write lock should be acquired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919705#comment-13919705
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

I'm giving Jenkins a try again to be sure this issue still persists..

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-1781:
---

 Summary: NM should allow users to specify max disk utilization for 
local disks
 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Varun Vasudev


This is related to YARN-257(it's probably a sub task?). Currently, the NM does 
not detect full disks and allows full disks to be used by containers leading to 
repeated failures. YARN-257 deals with graceful handling of full disks. This 
ticket is only about detection of full disks by the disk health checkers.

The NM should allow users to set a maximum disk utilization for local disks and 
mark disks as bad once they exceed that utilization. At the very least, the NM 
should at least detect full disks.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919724#comment-13919724
 ] 

Hadoop QA commented on YARN-1717:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632512/YARN-1717.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3244//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3244//console

This message is automatically generated.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6-extra.patch, 
 YARN-1717.6.patch, YARN-1717.7.patch, YARN-1717.8.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1761:


Attachment: YARN-1766.2.patch

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1761:


Attachment: (was: YARN-1766.2.patch)

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1761:


Attachment: YARN-1761.2.patch

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-04 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919728#comment-13919728
 ] 

Xuan Gong commented on YARN-1761:
-

submit the same patch again

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-04 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919697#comment-13919697
 ] 

Xuan Gong commented on YARN-1761:
-

bq. Remote-configuration-provider on RM is a server side property. We will not 
use it to specify client-side configuration. Given that, why do we need to use 
the config-provider on the client side?

Yes. We do not need it in the client side. 

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch, YARN-1761.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919684#comment-13919684
 ] 

Hudson commented on YARN-1730:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5260 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5260/])
YARN-1730. Implemented simple write-locking in the LevelDB based 
timeline-store. Contributed by Billie Rinaldi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574145)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java


 Leveldb timeline store needs simple write locking
 -

 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, 
 YARN-1730.4.patch, YARN-1730.5.patch, YARN-1730.6.patch


 Although the leveldb writes are performed atomically in a batch, a start time 
 for the entity needs to identified before each write.  Thus a per-entity 
 write lock should be acquired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919685#comment-13919685
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

Thanks Karthik, the latest patch looks good. I wish there were more tests 
directly validating tokens across fail-over. In the interest of progress, I am 
fine for now with your manual testing, we can file a separate ticket for that.

The test failures are unrelated, commented on HDFS-6040. But I am not sure 
there are any test or other issues with our patch itself. Let's see how 
HDFS-6040 goes. Or we can run the jenkins script with tests offline.

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1761:


Attachment: YARN-1761.2.patch

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch, YARN-1761.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs

2014-03-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919732#comment-13919732
 ] 

Zhijie Shen commented on YARN-1389:
---

Thanks for the new patch. Here're some more comments.

1. I still see codeApplicationHistoryServer/code in 
ApplicationClientProtocol. And some description sounds not accurate. For 
example,
{code}
+   * p
+   * The interface used by clients to get a report of all Application attempts
+   * in the cluster from the codeApplicationHistoryServer/code.
+   * /p
{code}
 Please double check the javadoc

2. ApplicationHistoryProtocol's javadoc has been wrongly modified.

3. Is it better to simplify the following condition? Same for all the similar 
conditions in the patch
{code}
+  if (!((e.getClass() == ApplicationNotFoundException.class) || (e
+  .getClass() == ApplicationAttemptNotFoundException.class))) {
{code}
to
{code}
+  if (e.getClass() != ApplicationNotFoundException.class  e
+  .getClass() != ApplicationAttemptNotFoundException.class) {
{code}

4. Please match the NotFoundException that will be thrown in 
ClientRMService, and that is analyzed in YarnClientImpl.

5. It is still an in-progress patch, isn't it? The test cases are still missing.

bq. 4. Users are not able to get completed application list via YarnClient
bq. Done

Didn't see the change to allow user to get the application list from the history

bq. These are just utility functions, do you think they are needed in 
RMAPPATtempt and RMContainer?

Please see what RMApp does



 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919736#comment-13919736
 ] 

Jian He commented on YARN-1752:
---

bq. What is the procedure to rerun the HadoopQA?
you can submit the same patch again and comment that submitting the same patch 
to kick off jenkins.

Patch looks good, but just that there's still a typo in the code comment: tries 
to register more than once,  which is introduced by an earlier patch, can you 
fix that also? thanks!

 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919750#comment-13919750
 ] 

Jason Lowe commented on YARN-1781:
--

Note that we may need to do more than just mark disks as unusable once they are 
full for a specified definition of full.  I suspect a disk being full is 
a more transient kind of failure than other failures, and it would be nice if 
full disks were added back in to the list of good dirs once they fall below a 
threshold.  Not a hard requirement necessarily for this JIRA, but I can see it 
being an immediate followup request if not implemented.  The recovery from full 
may be covered by YARN-90 when that's implemented.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Varun Vasudev

 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-1781:
---

Assignee: Varun Vasudev

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev

 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919766#comment-13919766
 ] 

Varun Vasudev commented on YARN-1781:
-

My plan is to work YARN-90 once a patch for this gets checked in.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev

 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919765#comment-13919765
 ] 

Vinod Kumar Vavilapalli commented on YARN-1766:
---

Hm.. in that case, can we validate default values of all things - queue config, 
admin-acls, proxy-config etc. when starting RM with FSBCP?

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-03-04 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-90:
-

Assignee: Varun Vasudev

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1389:
--

Summary: ApplicationClientProtocol and ApplicationHistoryProtocol should 
expose analogous APIs  (was: ApplicationClientProtocol and 
ApplicationHistoryProtocol should expose analog APIs)

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1781:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-257

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev

 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1781:


Attachment: apache-yarn-1781.0.patch

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1781.0.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-03-04 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1414:
--

Attachment: YARN-1221-v2.patch

 with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
 -

 Key: YARN-1414
 URL: https://issues.apache.org/jira/browse/YARN-1414
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
 Fix For: 2.2.0

 Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919867#comment-13919867
 ] 

Hadoop QA commented on YARN-986:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632426/yarn-986-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3245//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3245//console

This message is automatically generated.

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919868#comment-13919868
 ] 

Hadoop QA commented on YARN-1781:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12632576/apache-yarn-1781.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3249//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3249//console

This message is automatically generated.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1781.0.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1780) Improve logging in timeline service

2014-03-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1780:
--

Attachment: YARN-1780.1.patch

Create a patch to do the following things:

1. In TimelineClientImpl, make info level log of the timeline service address.

2. In TimelineClientImpl, capture the runtime of jersey client posting 
entities, and make error level log. The reason of doing this is that some 
important error, such as the connect refused exception, is wrapped in runtime 
exception by jersey client.

3. In TimlineWebServices, make info level log of the posted entities' ID to 
know whether the requests have been processed by the timeline service, and, 
make debug level log of the complete posted json content if necessary.

By do this, it will be more easier to trace whether an entity has been 
successfully posted to the timeline service.

 Improve logging in timeline service
 ---

 Key: YARN-1780
 URL: https://issues.apache.org/jira/browse/YARN-1780
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1780.1.patch


 The server side of timeline service is lacking logging information, which 
 makes debugging difficult



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1780) Improve logging in timeline service

2014-03-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1780:
--

Description: It's difficult to trace whether the client has successfully 
posted the entity to the timeline service or not.  (was: The server side of 
timeline service is lacking logging information, which makes debugging 
difficult)

 Improve logging in timeline service
 ---

 Key: YARN-1780
 URL: https://issues.apache.org/jira/browse/YARN-1780
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1780.1.patch


 It's difficult to trace whether the client has successfully posted the entity 
 to the timeline service or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919893#comment-13919893
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

TestNMClient is unrelated. Time to check this in.

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Cindy Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1525:
---

Attachment: Yarn1525.secure.patch

works in secure cluster.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-04 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919996#comment-13919996
 ] 

Varun Vasudev commented on YARN-1781:
-

The attached patch adds support for checking disk utilization as part of the 
checkDirs function in DirectoryCollection. Disk utilization can be specified as 
a percentage via the yarn config, with the default value set to 1.0F(use the 
full disk). 

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1781.0.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1782) CLI should let users to query cluster metrics

2014-03-04 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1782:
-

 Summary: CLI should let users to query cluster metrics
 Key: YARN-1782
 URL: https://issues.apache.org/jira/browse/YARN-1782
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


Like RM webUI and RESTful services, YARN CLI should also enable users to query 
the cluster metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920081#comment-13920081
 ] 

Hudson commented on YARN-986:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5261 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5261/])
YARN-986. Changed client side to be able to figure out the right RM Delegation 
token for the right ResourceManager when HA is enabled. Contributed by Karthik 
Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574190)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenSelector.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/TestClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java


 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.0

 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1766:


Attachment: YARN-1766.3.patch

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-04 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920089#comment-13920089
 ] 

Xuan Gong commented on YARN-1766:
-

modify the testcase to validate default values for queue config, admin-acls, 
proxy-config, exclude-nodelists, SuperGroupMapping and user-to-groups mapping.
Also, manually call refreshSuperUserGroupsConfiguration when RM does the 
initiation. 

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Arpit Gupta (JIRA)
Arpit Gupta created YARN-1783:
-

 Summary: yarn application does not make any progress even when no 
other application is running when RM is being restarted in the background
 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical


Noticed that during HA tests some tests took over 3 hours to run when the test 
failed.
Looking at the logs i see the application made no progress for a very long 
time. However if i look at application log from yarn it actually ran in 5 mins
I am seeing same behavior when RM was being restarted in the background and 
when both RM and AM were being restarted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920105#comment-13920105
 ] 

Arpit Gupta commented on YARN-1783:
---

application that took the longest is application_1393347856479_0014

{code}
14/02/25 17:41:01 INFO mapreduce.Job: Job job_1393347856479_0017 running in 
uber mode : false
14/02/25 17:41:01 INFO mapreduce.Job:  map 0% reduce 0%
2014-02-25 17:41:02,145|beaver.machine|INFO|RUNNING: /usr/bin/yarn application 
-list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-02-25 17:41:03,419|beaver.machine|INFO|Total number of applications 
(application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
RUNNING]):6
2014-02-25 17:41:03,419|beaver.machine|INFO|Application-Id  
Application-NameApplication-Type  User   Queue  
 State Final-State Progress 
   Tracking-URL
2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0013  
   Sleep job   MAPREDUCEhrt_qa default  
  ACCEPTED   UNDEFINED   0% 
N/A
2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0014  
test_mapred_ha_pending_job_rm_1393349992-1 MAPREDUCE
hrt_qa defaultACCEPTED   UNDEFINED  
 0%null
2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0018  
test_mapred_ha_pending_job_rm_1393349992-3 MAPREDUCE
hrt_qa default RUNNING   UNDEFINED  
 5% http://hor12n10.gq1.ygridcore.net:41840
2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0017  
test_mapred_ha_pending_job_rm_1393349992-2 MAPREDUCE
hrt_qa default RUNNING   UNDEFINED  
 5% http://hor12n10.gq1.ygridcore.net:51732
2014-02-25 17:41:03,421|beaver.machine|INFO|application_1393347856479_0016  
test_mapred_ha_pending_job_rm_1393349992-4 MAPREDUCE
hrt_qa default RUNNING   UNDEFINED  
 5% http://hor12n08.gq1.ygridcore.net:50966
2014-02-25 17:41:03,421|beaver.machine|INFO|application_1393347856479_0015  
test_mapred_ha_pending_job_rm_1393349992-0 MAPREDUCE
hrt_qa default RUNNING   UNDEFINED  
 35.01% http://hor12n08.gq1.ygridcore.net:54998
{code}



and this is when it completed

{code}
2014-02-25 20:52:32,992|beaver.machine|INFO|Total number of applications 
(application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
RUNNING]):1
2014-02-25 20:52:32,993|beaver.machine|INFO|Application-Id  
Application-NameApplication-Type  User   Queue  
 State Final-State Progress 
   Tracking-URL
2014-02-25 20:52:32,993|beaver.machine|INFO|application_1393347856479_0014  
test_mapred_ha_pending_job_rm_1393349992-1 MAPREDUCE
hrt_qa default RUNNING   UNDEFINED  
 86.01% http://hor12n08.gq1.ygridcore.net:46622
14/02/25 20:52:35 INFO mapreduce.Job:  map 100% reduce 100%
14/02/25 20:52:37 INFO mapreduce.Job: Job job_1393347856479_0014 completed 
successfully
14/02/25 20:52:37 INFO mapreduce.Job: Counters: 49
File System Counters
{code}

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical

 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2014-03-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1064:
---

Target Version/s: 2.4.0
   Fix Version/s: (was: 2.4.0)

Set target version 2.4.0. [~tucu00] - now that we have already shipped 2.2.0 GA 
and 2.3, do you think we should continue to call this a blocker? 

 YarnConfiguration scheduler configuration constants are not consistent
 --

 Key: YARN-1064
 URL: https://issues.apache.org/jira/browse/YARN-1064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Priority: Blocker
  Labels: newbie

 Some of the scheduler configuration constants in YarnConfiguration have 
 RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
 the same prefix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1784) TestContainerAllocation assumes CapacityScheduler

2014-03-04 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1784:
--

 Summary: TestContainerAllocation assumes CapacityScheduler
 Key: YARN-1784
 URL: https://issues.apache.org/jira/browse/YARN-1784
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor


TestContainerAllocation assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1783:
-

Target Version/s: 2.4.0

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical

 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated YARN-1783:
--

Description: 
Noticed that during HA tests some tests took over 3 hours to run when the test 
failed.
Looking at the logs i see the application made no progress for a very long 
time. However if i look at application log from yarn it actually ran in 5 mins
I am seeing same behavior when RM was being restarted in the background and 
when both RM and AM were being restarted. This does not happen for all 
applications but a few will hit this in the nightly run.

  was:
Noticed that during HA tests some tests took over 3 hours to run when the test 
failed.
Looking at the logs i see the application made no progress for a very long 
time. However if i look at application log from yarn it actually ran in 5 mins
I am seeing same behavior when RM was being restarted in the background and 
when both RM and AM were being restarted.


 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical

 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920133#comment-13920133
 ] 

Hadoop QA commented on YARN-1525:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632660/Yarn1525.secure.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3251//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3251//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3251//console

This message is automatically generated.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920137#comment-13920137
 ] 

Mit Desai commented on YARN-1670:
-

It is a code change and there are no unit tests for this change

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920159#comment-13920159
 ] 

Hadoop QA commented on YARN-1766:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632675/YARN-1766.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3252//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3252//console

This message is automatically generated.

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920195#comment-13920195
 ] 

Vinod Kumar Vavilapalli commented on YARN-1766:
---

+1, looks good. It's much better now, given we also caught a bug!

Checking this in.

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920203#comment-13920203
 ] 

Cindy Li commented on YARN-1525:


The findbug warning is easy to fix. Other than that, Karthik, Vinod, Xuan, can 
any of you give the patch a final review? 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920208#comment-13920208
 ] 

Xuan Gong commented on YARN-1525:
-

Looking at the patch now

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Cindy Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1525:
---

Attachment: Yarn1525.secure.patch

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920246#comment-13920246
 ] 

Hudson commented on YARN-1766:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5262 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5262/])
YARN-1766. Fixed a bug in ResourceManager to use configuration loaded from the 
configuration-provider when booting up. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574252)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920287#comment-13920287
 ] 

Hadoop QA commented on YARN-1525:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632705/Yarn1525.secure.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3253//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3253//console

This message is automatically generated.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1785) FairScheduler treats app lookup failures as ERRORs

2014-03-04 Thread bc Wong (JIRA)
bc Wong created YARN-1785:
-

 Summary: FairScheduler treats app lookup failures as ERRORs
 Key: YARN-1785
 URL: https://issues.apache.org/jira/browse/YARN-1785
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: bc Wong


When invoking the /ws/v1/cluster/apps endpoint, RM will eventually get to 
RMAppImpl#createAndGetApplicationReport, which calls 
RMAppAttemptImpl#getApplicationResourceUsageReport, which looks up the app in 
the scheduler, which may or may not exist. So FairScheduler shouldn't log an 
error for every lookup failure:

{noformat}
2014-02-17 08:23:21,240 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Request for appInfo of unknown attemptappattempt_1392419715319_0135_01

{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920355#comment-13920355
 ] 

Jian He commented on YARN-1783:
---

The problem is that while NM is resyncing with RM, NM will clean the finished 
containers from its context before it processes the resync command. But the RM 
is still waiting for previous AM container Finished event from NM after it 
restarts,  so that it knows to launch a new attempt.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical

 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1783:
--

Attachment: YARN-1783.1.patch

The patch splits the getContainerStatus and removeCompletedContainers logic in 
the original method getNodeStatusAndUpdateContainersInContext, and put the 
removeCompletedContainers after the nodeStatusUpdater gets the resync command.

Rewrote TestNodeStatusUpdater.testCompletedContainerStatusBackup() as that test 
case was broken.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920377#comment-13920377
 ] 

Jian He commented on YARN-1783:
---

The new added test passed with the core code change and failed without the core 
change.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920392#comment-13920392
 ] 

Hadoop QA commented on YARN-1783:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632737/YARN-1783.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3254//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3254//console

This message is automatically generated.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920395#comment-13920395
 ] 

Karthik Kambatla commented on YARN-1525:


Verified - the redirection works as expected on a secure cluster. 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-04 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1783:
--

Attachment: YARN-1783.2.patch

new patch with minor fix

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch, YARN-1783.2.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920402#comment-13920402
 ] 

Karthik Kambatla commented on YARN-1525:


Comments on the latest patch (it would be easier to refer to patches if they 
are versioned):
# RMHAUtils#findActiveRMId: looks like we return before resetting the id to 
initial RM-id? It might be safer to just create a copy of the conf - new 
YarnConfiguration(yarnConf), so we don't have to store the initial value and 
reset at the end.
{code}
if (haState.equals(HAServiceState.ACTIVE)) {
  yarnConf.set(YarnConfiguration.RM_HA_ID, rmId);
  return currentId;
}
  } catch (Exception e) {
  }
}
yarnConf.set(YarnConfiguration.RM_HA_ID, rmId);
{code}
# RMDispatcher: remove TODO
{code}
  RMDispatcher(WebApp webApp, Injector injector, Router router) {
super(webApp, injector, router);
// TODO Auto-generated constructor stub
  }
{code}
# Don't see much use for RMWebApp#standbyMode. Why not just use isStandbyMode().
# Nit: Spurious commit. Get rid of it? 
{code}
-  void setRedirectPath(String path) { this.redirectPath = path; }
+  protected void setRedirectPath(String path) {
+this.redirectPath = path;
+  }
{code}


 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)
shenhong created YARN-1786:
--

 Summary: TestRMAppTransitions occasionally fail
 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong updated YARN-1786:
---

Description: 

{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}

  was:
{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}


 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong

 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 

[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong updated YARN-1786:
---

Description: 
{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}

 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong

 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
  
 testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
  
 testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong updated YARN-1786:
---

Description: 
TestRMAppTransitions often fail with application finish time is not greater 
then 0, following is log:
{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}


  was:

{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}


 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong

 TestRMAppTransitions often fail with application finish time is not greater 
 then 0, following is log:
 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 

[jira] [Commented] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920410#comment-13920410
 ] 

shenhong commented on YARN-1786:


Here is the code:
{code}
  private void sendAppUpdateSavedEvent(RMApp application) {
RMAppEvent event =
new RMAppUpdateSavedEvent(application.getApplicationId(), null);
application.handle(event);
rmDispatcher.await();
  }

  private void sendAttemptUpdateSavedEvent(RMApp application) {
application.getCurrentAppAttempt().handle(
  new RMAppAttemptUpdateSavedEvent(application.getCurrentAppAttempt()
.getAppAttemptId(), null));
  }
{code}
At sendAttemptUpdateSavedEvent(), there is no rmDispatcher.await() after handle 
an event, it should be changed to 
{code}
  private void sendAttemptUpdateSavedEvent(RMApp application) {
application.getCurrentAppAttempt().handle(
  new RMAppAttemptUpdateSavedEvent(application.getCurrentAppAttempt()
.getAppAttemptId(), null));
   rmDispatcher.await();
  }
{code}

 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong

 TestRMAppTransitions often fail with application finish time is not greater 
 then 0, following is log:
 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
  
 testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
  
 testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong reassigned YARN-1786:
--

Assignee: shenhong

 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong
Assignee: shenhong
 Attachments: YARN-1786.patch


 TestRMAppTransitions often fail with application finish time is not greater 
 then 0, following is log:
 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
  
 testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
  
 testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong updated YARN-1786:
---

Attachment: YARN-1786.patch

Add a patch to fix the bug!

 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong
 Attachments: YARN-1786.patch


 TestRMAppTransitions often fail with application finish time is not greater 
 then 0, following is log:
 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
  
 testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
  
 testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
 application finish time is not greater then 0 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail

2014-03-04 Thread shenhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong updated YARN-1786:
---

Description: 
TestRMAppTransitions often fail with application finish time is not greater 
then 0, following is log:
{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}


  was:
TestRMAppTransitions often fail with application finish time is not greater 
then 0, following is log:
{code}
testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624)
 
testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.033 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
 
testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
 Time elapsed: 0.036 sec  FAILURE! junit.framework.AssertionFailedError: 
application finish time is not greater then 0 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646)
{code}



 TestRMAppTransitions occasionally fail
 --

 Key: YARN-1786
 URL: https://issues.apache.org/jira/browse/YARN-1786
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: shenhong
Assignee: shenhong
 Attachments: YARN-1786.patch


 TestRMAppTransitions often fail with application finish time is not greater 
 then 0, following is log:
 {code}
 testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
  Time elapsed: 0.04 sec  FAILURE! junit.framework.AssertionFailedError: 
 

  1   2   >