[jira] [Commented] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900227#comment-13900227
 ] 

Hudson commented on YARN-1345:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/480/])
YARN-1345. Remove FINAL_SAVING state from YarnApplicationAttemptState. 
Contributed by Zhijie Shen (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567820)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Removing FINAL_SAVING from YarnApplicationAttemptState
 --

 Key: YARN-1345
 URL: https://issues.apache.org/jira/browse/YARN-1345
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1345.1.patch, YARN-1345.2.patch


 Whenever YARN-891 is done, we need to add the mapping of 
 RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in 
 RMServerUtils#createApplicationAttemptState



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1578) Fix how to read history file in FileSystemApplicationHistoryStore

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900231#comment-13900231
 ] 

Hudson commented on YARN-1578:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/480/])
YARN-1578. Fixed reading incomplete application attempt and container data in 
FileSystemApplicationHistoryStore. Contributed by Shinichi Yamashita. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567816)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestFileSystemApplicationHistoryStore.java


 Fix how to read history file in FileSystemApplicationHistoryStore
 -

 Key: YARN-1578
 URL: https://issues.apache.org/jira/browse/YARN-1578
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Fix For: 2.4.0

 Attachments: YARN-1578-2.patch, YARN-1578-3.patch, YARN-1578-4.patch, 
 YARN-1578.patch, application_1390978867235_0001, resoucemanager.log, 
 screenshot.png, screenshot2.pdf


 I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
 After the job end and when I accessed Web UI of HistoryServer, it displayed 
 500. And HistoryServer daemon log was output as follows.
 {code}
 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: 
 /applicationhistory/appattempt/appattempt_1389146249925_0008_01
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 (snip...)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
 at 
 org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
 (snip...)
 {code}
 I confirmed that there was container which was not finished from 
 ApplicationHistory file.
 In ResourceManager daemon log, ResourceManager reserved this container, but 
 did not allocate it.
 When FileSystemApplicationHistoryStore reads container information without 
 finish data in history file, this problem occurs.
 In consideration of the case which there is not finish data, we should fix 
 how to read history file in FileSystemApplicationHistoryStore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1692) ConcurrentModificationException in fair scheduler AppSchedulable

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900230#comment-13900230
 ] 

Hudson commented on YARN-1692:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/480/])
Move YARN-1692 in CHANGES.txt (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567793)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1692. ConcurrentModificationException in fair scheduler AppSchedulable 
(Sangjin Lee via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567788)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java


 ConcurrentModificationException in fair scheduler AppSchedulable
 

 Key: YARN-1692
 URL: https://issues.apache.org/jira/browse/YARN-1692
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Fix For: 2.4.0

 Attachments: yarn-1692-branch-2.3.patch, yarn-1692.patch


 We saw a ConcurrentModificationException thrown in the fair scheduler:
 {noformat}
 2014-02-07 01:40:01,978 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Exception in fair scheduler UpdateThread
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
 at java.util.HashMap$ValueIterator.next(HashMap.java:954)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.updateDemand(AppSchedulable.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.updateDemand(FSLeafQueue.java:125)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.updateDemand(FSParentQueue.java:82)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:217)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:195)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}
 The map that  gets returned by FSSchedulerApp.getResourceRequests() are 
 iterated on without proper synchronization.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1641) ZK store should attempt a write periodically to ensure it is still Active

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900234#comment-13900234
 ] 

Hudson commented on YARN-1641:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/480/])
YARN-1641. ZK store should attempt a write periodically to ensure it is still 
Active. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567628)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 ZK store should attempt a write periodically to ensure it is still Active
 -

 Key: YARN-1641
 URL: https://issues.apache.org/jira/browse/YARN-1641
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1641-1.patch, yarn-1641-2.patch


 Fencing in ZK store kicks in when the RM tries to write something to the 
 store. If the RM doesn't write anything to the store, it doesn't get fenced 
 and can continue to assume being the Active. 
 By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can 
 ensure it gets fenced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1531) True up yarn command documentation

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900226#comment-13900226
 ] 

Hudson commented on YARN-1531:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/480/])
YARN-1531. True up yarn command documentation (Akira Ajisaka via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567775)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 True up yarn command documentation
 --

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Fix For: 2.4.0

 Attachments: YARN-1531.2.patch, YARN-1531.3.patch, YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900321#comment-13900321
 ] 

Hudson commented on YARN-1345:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/])
YARN-1345. Remove FINAL_SAVING state from YarnApplicationAttemptState. 
Contributed by Zhijie Shen (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567820)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Removing FINAL_SAVING from YarnApplicationAttemptState
 --

 Key: YARN-1345
 URL: https://issues.apache.org/jira/browse/YARN-1345
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1345.1.patch, YARN-1345.2.patch


 Whenever YARN-891 is done, we need to add the mapping of 
 RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in 
 RMServerUtils#createApplicationAttemptState



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1641) ZK store should attempt a write periodically to ensure it is still Active

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900328#comment-13900328
 ] 

Hudson commented on YARN-1641:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/])
YARN-1641. ZK store should attempt a write periodically to ensure it is still 
Active. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567628)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 ZK store should attempt a write periodically to ensure it is still Active
 -

 Key: YARN-1641
 URL: https://issues.apache.org/jira/browse/YARN-1641
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1641-1.patch, yarn-1641-2.patch


 Fencing in ZK store kicks in when the RM tries to write something to the 
 store. If the RM doesn't write anything to the store, it doesn't get fenced 
 and can continue to assume being the Active. 
 By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can 
 ensure it gets fenced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1531) True up yarn command documentation

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900320#comment-13900320
 ] 

Hudson commented on YARN-1531:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/])
YARN-1531. True up yarn command documentation (Akira Ajisaka via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567775)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 True up yarn command documentation
 --

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Fix For: 2.4.0

 Attachments: YARN-1531.2.patch, YARN-1531.3.patch, YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1677) Potential bugs in exception handlers

2014-02-13 Thread Ding Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900332#comment-13900332
 ] 

Ding Yuan commented on YARN-1677:
-

Hi, is there anything else I could help for these cases?

 Potential bugs in exception handlers
 

 Key: YARN-1677
 URL: https://issues.apache.org/jira/browse/YARN-1677
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Ding Yuan
 Attachments: yarn-1677.patch


 Hi Yarn developers,
 We are a group of researchers on software reliability, and recently we did a 
 study and found that majority of the most severe failures in hadoop are 
 caused by bugs in exception handling logic. Therefore we built a simple 
 checking tool that automatically detects some bug patterns that have caused 
 some very severe failures. I am reporting some of the results for Yarn here. 
 Any feedback is much appreciated!
 ==
 Case 1:
 Line: 551, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
 {noformat}
 switch (monitoringEvent.getType()) {
 case START_MONITORING_CONTAINER:
   .. ..
 default:
   // TODO: Wrong event.
 }
 {noformat}
 The switch fall-through (handling any potential unexpected event) is empty. 
 Should we at least print an error message here?
 ==
 ==
 Case 2:
   Line: 491, File: 
 org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
 {noformat}
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 {noformat}
 The handler of this very general exception only logs the error. The TODO 
 seems to indicate it is not sufficient.
 ==
 ==
 Case 3:
 Line: 861, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
for (LocalResourceStatus stat : remoteResourceStatuses) {
 LocalResource rsrc = stat.getResource();
 LocalResourceRequest req = null;
 try {
   req = new LocalResourceRequest(rsrc);
 } catch (URISyntaxException e) {
   // TODO fail? Already translated several times...
 }
 The handler for URISyntaxException is empty, and the TODO seems to indicate 
 it is not sufficient.
 The same code pattern can also be found at:
 Line: 901, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 Line: 838, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 Line: 878, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 At line: 803, File: 
 org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, 
 the handler of URISyntaxException also seems not sufficient:
 {noformat}
try {
   shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(
   shellScriptPath)));
 } catch (URISyntaxException e) {
   LOG.error(Error when trying to use shell script path specified
   +  in env, path= + shellScriptPath);
   e.printStackTrace();
   // A failure scenario on bad input such as invalid shell script path
   // We know we cannot continue launching the container
   // so we should release it.
   // TODO
   numCompletedContainers.incrementAndGet();
   numFailedContainers.incrementAndGet();
   return;
 }
 {noformat}
 ==
 ==
 Case 4:
 Line: 627, File: 
 org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
 {noformat}
   try {
 /* keep the master in sync with the state machine */
 this.stateMachine.doTransition(event.getType(), event);
   } catch (InvalidStateTransitonException e) {
 LOG.error(Can't handle this event at current state, e);
 /* TODO fail the application on the failed transition */
   }
 {noformat}
 The handler of this exception only logs the error. The TODO seems to indicate 
 it is not sufficient.
 This exact same code pattern can also be found at:
 Line: 573, File: 
 org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
 ==
 ==
 Case 5: empty handler for exception: java.lang.InterruptedException
   Line: 123, File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java
 {noformat}
   public void join() {
 if(proxyServer != null) {
   

[jira] [Commented] (YARN-1578) Fix how to read history file in FileSystemApplicationHistoryStore

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900396#comment-13900396
 ] 

Hudson commented on YARN-1578:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/])
YARN-1578. Fixed reading incomplete application attempt and container data in 
FileSystemApplicationHistoryStore. Contributed by Shinichi Yamashita. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567816)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestFileSystemApplicationHistoryStore.java


 Fix how to read history file in FileSystemApplicationHistoryStore
 -

 Key: YARN-1578
 URL: https://issues.apache.org/jira/browse/YARN-1578
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Fix For: 2.4.0

 Attachments: YARN-1578-2.patch, YARN-1578-3.patch, YARN-1578-4.patch, 
 YARN-1578.patch, application_1390978867235_0001, resoucemanager.log, 
 screenshot.png, screenshot2.pdf


 I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
 After the job end and when I accessed Web UI of HistoryServer, it displayed 
 500. And HistoryServer daemon log was output as follows.
 {code}
 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: 
 /applicationhistory/appattempt/appattempt_1389146249925_0008_01
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 (snip...)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
 at 
 org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
 (snip...)
 {code}
 I confirmed that there was container which was not finished from 
 ApplicationHistory file.
 In ResourceManager daemon log, ResourceManager reserved this container, but 
 did not allocate it.
 When FileSystemApplicationHistoryStore reads container information without 
 finish data in history file, this problem occurs.
 In consideration of the case which there is not finish data, we should fix 
 how to read history file in FileSystemApplicationHistoryStore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1641) ZK store should attempt a write periodically to ensure it is still Active

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900399#comment-13900399
 ] 

Hudson commented on YARN-1641:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/])
YARN-1641. ZK store should attempt a write periodically to ensure it is still 
Active. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567628)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 ZK store should attempt a write periodically to ensure it is still Active
 -

 Key: YARN-1641
 URL: https://issues.apache.org/jira/browse/YARN-1641
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1641-1.patch, yarn-1641-2.patch


 Fencing in ZK store kicks in when the RM tries to write something to the 
 store. If the RM doesn't write anything to the store, it doesn't get fenced 
 and can continue to assume being the Active. 
 By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can 
 ensure it gets fenced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1692) ConcurrentModificationException in fair scheduler AppSchedulable

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900395#comment-13900395
 ] 

Hudson commented on YARN-1692:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/])
Move YARN-1692 in CHANGES.txt (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567793)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1692. ConcurrentModificationException in fair scheduler AppSchedulable 
(Sangjin Lee via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567788)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java


 ConcurrentModificationException in fair scheduler AppSchedulable
 

 Key: YARN-1692
 URL: https://issues.apache.org/jira/browse/YARN-1692
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Fix For: 2.4.0

 Attachments: yarn-1692-branch-2.3.patch, yarn-1692.patch


 We saw a ConcurrentModificationException thrown in the fair scheduler:
 {noformat}
 2014-02-07 01:40:01,978 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Exception in fair scheduler UpdateThread
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
 at java.util.HashMap$ValueIterator.next(HashMap.java:954)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.updateDemand(AppSchedulable.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.updateDemand(FSLeafQueue.java:125)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.updateDemand(FSParentQueue.java:82)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:217)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:195)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}
 The map that  gets returned by FSSchedulerApp.getResourceRequests() are 
 iterated on without proper synchronization.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900392#comment-13900392
 ] 

Hudson commented on YARN-1345:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/])
YARN-1345. Remove FINAL_SAVING state from YarnApplicationAttemptState. 
Contributed by Zhijie Shen (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567820)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Removing FINAL_SAVING from YarnApplicationAttemptState
 --

 Key: YARN-1345
 URL: https://issues.apache.org/jira/browse/YARN-1345
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1345.1.patch, YARN-1345.2.patch


 Whenever YARN-891 is done, we need to add the mapping of 
 RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in 
 RMServerUtils#createApplicationAttemptState



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1531) True up yarn command documentation

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900391#comment-13900391
 ] 

Hudson commented on YARN-1531:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/])
YARN-1531. True up yarn command documentation (Akira Ajisaka via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567775)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 True up yarn command documentation
 --

 Key: YARN-1531
 URL: https://issues.apache.org/jira/browse/YARN-1531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: documentaion
 Fix For: 2.4.0

 Attachments: YARN-1531.2.patch, YARN-1531.3.patch, YARN-1531.patch


 There are some options which are not written to Yarn Command document.
 For example, yarn rmadmin command options are as follows:
 {code}
  Usage: yarn rmadmin
-refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}
 But some of the new options such as -getGroups, -transitionToActive, and 
 -transitionToStandby are not documented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1717) Misc improvements to leveldb timeline store

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900542#comment-13900542
 ] 

Hadoop QA commented on YARN-1717:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628790/YARN-1717.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3096//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3096//console

This message is automatically generated.

 Misc improvements to leveldb timeline store
 ---

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * braces for all control flow statements
 * simple locking to prevent issues related to concurrent writes
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1725) RM should provide an easier way for the app to reject a bad allocation

2014-02-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900551#comment-13900551
 ] 

Bikas Saha commented on YARN-1725:
--

The only way to do it on AMRMClient would be to know which user request this 
container would match and then submit that user request again to the RM. There 
is no general way to do that correctly. I guess the problem is similar on the 
RM side since once its decremented the *, rack and node counters for requests, 
it can undo the * counter but does not know what to undo on the rack and node 
counters.

 RM should provide an easier way for the app to reject a bad allocation
 --

 Key: YARN-1725
 URL: https://issues.apache.org/jira/browse/YARN-1725
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha

 Currently, if the app gets a bad allocation then it can release the 
 container. However, the app now needs to request those resources again or 
 else the RM will not give it a new container in lieu of the one just 
 rejected. This makes the app writers life hard.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-304.
--

Resolution: Duplicate

Review the requirement again. It seems that the requirement has already been 
met by AHS. The finished applications will be served by AHS and the tracking 
links are persisted if they exist. Please feel free to reopen it if you think 
something is still missing.

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-02-13 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900672#comment-13900672
 ] 

Cindy Li commented on YARN-1525:


org.apache.hadoop.yarn.client.api.impl.TestNMClient is irrelevant.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, 
 YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, 
 YARN1525.v8.patch, YARN1525.v9.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1553) Do not use HttpConfig.isSecure() in YARN

2014-02-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900725#comment-13900725
 ] 

Vinod Kumar Vavilapalli commented on YARN-1553:
---

Seems like you missed this:
 - MiniMRYarnCluster: Shouldn’t use WebAppUtils for logging the 
JobHistoryserver’s address

Looks good otherwise.

 Do not use HttpConfig.isSecure() in YARN
 

 Key: YARN-1553
 URL: https://issues.apache.org/jira/browse/YARN-1553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: YARN-1553.000.patch, YARN-1553.001.patch, 
 YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, 
 YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, 
 YARN-1553.008.patch


 HDFS-5305 and related jira decide that each individual project will have 
 their own configuration on http policy. {{HttpConfig.isSecure}} is a global 
 static method which does not fit the design anymore. The same functionality 
 should be moved into the YARN code base.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1553) Do not use HttpConfig.isSecure() in YARN

2014-02-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated YARN-1553:
-

Attachment: YARN-1553.009.patch

Address Vinod's comments.

 Do not use HttpConfig.isSecure() in YARN
 

 Key: YARN-1553
 URL: https://issues.apache.org/jira/browse/YARN-1553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: YARN-1553.000.patch, YARN-1553.001.patch, 
 YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, 
 YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, 
 YARN-1553.008.patch, YARN-1553.009.patch


 HDFS-5305 and related jira decide that each individual project will have 
 their own configuration on http policy. {{HttpConfig.isSecure}} is a global 
 static method which does not fit the design anymore. The same functionality 
 should be moved into the YARN code base.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1727) provide (async) application lifecycle events to management tools

2014-02-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900741#comment-13900741
 ] 

Zhijie Shen commented on YARN-1727:
---

Could we do it via Timeline service?

 provide (async) application lifecycle events to management tools
 

 Key: YARN-1727
 URL: https://issues.apache.org/jira/browse/YARN-1727
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Steve Loughran

 Management tools need to monitor long-lived applications. While the 
 {{AM-Management System}} protocol is a matter for them, the management 
 tooling will need to know about async events happening in YARN
 # application submitted
 # AM started
 # AM failed
 # AM restarted
 # AM finished
 This could be done by pushing events somewhere, or supporting a pollable 
 history mechanism



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1676) Make admin refreshUserToGroupsMappings of configuration work across RM failover

2014-02-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900742#comment-13900742
 ] 

Vinod Kumar Vavilapalli commented on YARN-1676:
---

+1, looks good. Checking this in.

 Make admin refreshUserToGroupsMappings of configuration work across RM 
 failover
 ---

 Key: YARN-1676
 URL: https://issues.apache.org/jira/browse/YARN-1676
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1667.3.patch, YARN-1676.1.patch, YARN-1676.2.patch, 
 YARN-1676.3.patch, YARN-1676.4.patch, YARN-1676.5.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1728) History server doesn't understand percent encoded paths

2014-02-13 Thread Abraham Elmahrek (JIRA)
Abraham Elmahrek created YARN-1728:
--

 Summary: History server doesn't understand percent encoded paths
 Key: YARN-1728
 URL: https://issues.apache.org/jira/browse/YARN-1728
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abraham Elmahrek


For example, going to the job history server page 
http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
 results in the following error:
{code}
Cannot get container logs. Invalid nodeId: test-cdh5-hue.ent.cloudera.com%3A8041
{code}

Where the url decoded version works:
http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr

It seems like both should be supported as the former is simply percent encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1417) RM may issue expired container tokens to AM while issuing new containers.

2014-02-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900774#comment-13900774
 ] 

Vinod Kumar Vavilapalli commented on YARN-1417:
---

Looks good. +1. Checking this in.

 RM may issue expired container tokens to AM while issuing new containers.
 -

 Key: YARN-1417
 URL: https://issues.apache.org/jira/browse/YARN-1417
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-1417.2.patch, YARN-1417.3.patch


 Today we create new container token when we create container in RM as a part 
 of schedule cycle. However that container may get reserved or assigned. If 
 the container gets reserved and remains like that (in reserved state) for 
 more than container token expiry interval then RM will end up issuing 
 container with expired token.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1676) Make admin refreshUserToGroupsMappings of configuration work across RM failover

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900778#comment-13900778
 ] 

Hudson commented on YARN-1676:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5165 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5165/])
YARN-1676. Modified RM HA handling of user-to-group mappings to be available 
across RM failover by making using of a remote configuration-provider. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1568041)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 Make admin refreshUserToGroupsMappings of configuration work across RM 
 failover
 ---

 Key: YARN-1676
 URL: https://issues.apache.org/jira/browse/YARN-1676
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1667.3.patch, YARN-1676.1.patch, YARN-1676.2.patch, 
 YARN-1676.3.patch, YARN-1676.4.patch, YARN-1676.5.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Reopened] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-304:
--


 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900806#comment-13900806
 ] 

Vinod Kumar Vavilapalli commented on YARN-304:
--

Can we keep this open to track the removal of the plugin added at YARN-285 ?

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1717) Misc improvements to leveldb timeline store

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1717:
-

Attachment: YARN-1717.5.patch

Added fix for passing primary and secondary filters query params from web 
services to store.

 Misc improvements to leveldb timeline store
 ---

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch, YARN-1717.5.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * braces for all control flow statements
 * simple locking to prevent issues related to concurrent writes
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900869#comment-13900869
 ] 

Zhijie Shen commented on YARN-304:
--

Sure, have an offline discussion with Vinod. Let me summarize the expected 
behavior of tracking urls:

1. If users doesn't specify the tracking urls, let's keep it null.

2. If the application is cached in RM, it is accessible via RM web UI. The 
tracking url on RM web UI or being returned via report should be:
if not null - user specified tracking url
else if AHS is enabled - the url to the application page on AHS
else - the url pointing to itself (RM)

3. If AHS is enabled and the application is recorded, it is accessible via AHS 
web UI. The tracking url  on AHS web UI or being returned via report should be
if not null - user specified tracking url
else - the url pointing to itself (AHS)

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900870#comment-13900870
 ] 

Zhijie Shen commented on YARN-304:
--

After the aforementioned thing is done, the plugin should be safe to remove

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable

2014-02-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900871#comment-13900871
 ] 

Jian He commented on YARN-713:
--

I'd like to take this over.

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Critical
 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, 
 YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, 
 YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable

2014-02-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-713:


Assignee: Jian He  (was: Omkar Vinit Joshi)

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jian He
Priority: Critical
 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, 
 YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, 
 YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900883#comment-13900883
 ] 

Jason Lowe commented on YARN-304:
-

What about the case where the application is not cached in the RM and the AHS 
is not enabled?  That's the case being handled by the plugin today, and isn't 
covered by the scenarios above.

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS

2014-02-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900900#comment-13900900
 ] 

Vinod Kumar Vavilapalli commented on YARN-1590:
---

Looks good, +1. Running it through Jenkins one more time to be sure.

 _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
 -

 Key: YARN-1590
 URL: https://issues.apache.org/jira/browse/YARN-1590
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, 
 YARN-1590.4.patch


 _HOST is not properly substituted when we use VIP address. Currently it 
 always used the host name of the machine and disregard the VIP address. It is 
 true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is 
 working fine for webservice authentication.
 On the other hand, the same thing is working fine for NN and SNN in RPC as 
 well as webservice.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1417) RM may issue expired container tokens to AM while issuing new containers.

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900904#comment-13900904
 ] 

Hudson commented on YARN-1417:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5166 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5166/])
YARN-1417. Modified RM to generate container-tokens not at creation time, but 
at allocation time so as to prevent RM
from shelling out containers with expired tokens. Contributed by Omkar Vinit 
Joshi and Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1568060)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java


 RM may issue expired container tokens to AM while issuing new containers.
 -

 Key: YARN-1417
 URL: https://issues.apache.org/jira/browse/YARN-1417
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Jian He
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1417.2.patch, YARN-1417.3.patch


 Today we create new container token when we create container in RM as a part 
 of schedule cycle. However that container may get reserved or assigned. If 
 the container gets reserved and remains like that (in reserved state) for 
 more than container token expiry interval then RM will end up issuing 
 container with expired token.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-02-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900916#comment-13900916
 ] 

Zhijie Shen commented on YARN-304:
--

Hm..., thanks for pointing this out, Json! Without assuming AHS is always 
available with RM, the plugin seems to be still necessary, because 
unavailability of AHS can be considered as the original scenario when AHS 
feature is not there.

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Zhijie Shen

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1553) Do not use HttpConfig.isSecure() in YARN

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900936#comment-13900936
 ] 

Hadoop QA commented on YARN-1553:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628850/YARN-1553.009.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3097//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3097//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-web-proxy.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3097//console

This message is automatically generated.

 Do not use HttpConfig.isSecure() in YARN
 

 Key: YARN-1553
 URL: https://issues.apache.org/jira/browse/YARN-1553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: YARN-1553.000.patch, YARN-1553.001.patch, 
 YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, 
 YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, 
 YARN-1553.008.patch, YARN-1553.009.patch


 HDFS-5305 and related jira decide that each individual project will have 
 their own configuration on http policy. {{HttpConfig.isSecure}} is a global 
 static method which does not fit the design anymore. The same functionality 
 should be moved into the YARN code base.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState

2014-02-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1345:
--

Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed

 Removing FINAL_SAVING from YarnApplicationAttemptState
 --

 Key: YARN-1345
 URL: https://issues.apache.org/jira/browse/YARN-1345
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1345.1.patch, YARN-1345.2.patch


 Whenever YARN-891 is done, we need to add the mapping of 
 RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in 
 RMServerUtils#createApplicationAttemptState



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState

2014-02-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1345:
--

Affects Version/s: 2.4.0

 Removing FINAL_SAVING from YarnApplicationAttemptState
 --

 Key: YARN-1345
 URL: https://issues.apache.org/jira/browse/YARN-1345
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1345.1.patch, YARN-1345.2.patch


 Whenever YARN-891 is done, we need to add the mapping of 
 RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in 
 RMServerUtils#createApplicationAttemptState



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1729) ATSWebServices always passes primary and secondary filters as strings

2014-02-13 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-1729:


 Summary: ATSWebServices always passes primary and secondary 
filters as strings
 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


Primary filters and secondary filter values can be arbitrary json-compatible 
Object.  The web services should determine if the filters specified as query 
parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1730) Leveldb timeline store needs simple write locking

2014-02-13 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-1730:


 Summary: Leveldb timeline store needs simple write locking
 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


The actual data writes are performed atomically in a batch, but a lock should 
be held while identifying a start time for the entity, which precedes every 
write.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1730) Leveldb timeline store needs simple write locking

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1730:
-

Attachment: YARN-1730.1.patch

 Leveldb timeline store needs simple write locking
 -

 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1730.1.patch


 The actual data writes are performed atomically in a batch, but a lock should 
 be held while identifying a start time for the entity, which precedes every 
 write.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1731) ResourceManager should record killed ApplicationMasters for History

2014-02-13 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-1731:


Attachment: YARN-1731.patch

I’ve attached a preliminary version of the patch.  Once we all agree on the 
specifics of the design, I can add unit tests.  This basically writes out an 
empty file with the app attempt id and user as the filename to a directory in 
HDFS (the JHS or something else such as the AHS could then see it).  We can 
easily replace HDFS with some other FileSystem or mechanism or make it 
pluggable in some fashion.  

 ResourceManager should record killed ApplicationMasters for History
 ---

 Key: YARN-1731
 URL: https://issues.apache.org/jira/browse/YARN-1731
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1731.patch


 Yarn changes required for MAPREDUCE-5641 to make the RM record when an AM is 
 killed so the JHS (or something else) can know about it).  See MAPREDUCE-5641 
 for the design I'm trying to follow.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1731) ResourceManager should record killed ApplicationMasters for History

2014-02-13 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-1731:
---

 Summary: ResourceManager should record killed ApplicationMasters 
for History
 Key: YARN-1731
 URL: https://issues.apache.org/jira/browse/YARN-1731
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1731.patch

Yarn changes required for MAPREDUCE-5641 to make the RM record when an AM is 
killed so the JHS (or something else) can know about it).  See MAPREDUCE-5641 
for the design I'm trying to follow.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1717) Misc improvements to leveldb timeline store

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901060#comment-13901060
 ] 

Hadoop QA commented on YARN-1717:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628880/YARN-1717.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3098//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3098//console

This message is automatically generated.

 Misc improvements to leveldb timeline store
 ---

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch, YARN-1717.5.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * braces for all control flow statements
 * simple locking to prevent issues related to concurrent writes
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1553) Do not use HttpConfig.isSecure() in YARN

2014-02-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated YARN-1553:
-

Attachment: YARN-1553.010.patch

Fix the findbugs warning.

 Do not use HttpConfig.isSecure() in YARN
 

 Key: YARN-1553
 URL: https://issues.apache.org/jira/browse/YARN-1553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: YARN-1553.000.patch, YARN-1553.001.patch, 
 YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, 
 YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, 
 YARN-1553.008.patch, YARN-1553.009.patch, YARN-1553.010.patch


 HDFS-5305 and related jira decide that each individual project will have 
 their own configuration on http policy. {{HttpConfig.isSecure}} is a global 
 static method which does not fit the design anymore. The same functionality 
 should be moved into the YARN code base.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2014-02-13 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-1515:


Attachment: YARN-1515.v03.patch

v03 patch with combined RPC. Hopefully easier to review :)

{code}
$ wc YARN-1515.v0*
16645725   75045 YARN-1515.v02.patch
 9823401   45101 YARN-1515.v03.patch
{code}

 Ability to dump the container threads and stop the containers in a single RPC
 -

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1732) Change types of related entities and primary filters in ATSEntity

2014-02-13 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-1732:


 Summary: Change types of related entities and primary filters in 
ATSEntity
 Key: YARN-1732
 URL: https://issues.apache.org/jira/browse/YARN-1732
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


The current types MapString, ListString relatedEntities and MapString, 
Object primaryFilters have issues.  The ListString value of the related 
entities map could have multiple identical strings in it, which doesn't make 
sense. A more major issue is that we cannot allow primary filter values to be 
overwritten, because otherwise we will be unable to find those primary filter 
entries when we want to delete an entity (without doing a nearly full scan).

I propose changing related entities to MapString, SetString and primary 
filters to MapString, SetObject.  The basic methods to add primary filters 
and related entities are of the form add(key, value) and will not need to 
change.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-02-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901081#comment-13901081
 ] 

Junping Du commented on YARN-1506:
--

Agree with Arun that this is not a blocker.

Hi [~bikassaha], Thanks for you review and comments. Sorry for replying late as 
just come back from a long vacation. Please see my reply below:
bq. ADMIN_RESOURCE_UPDATE instead of RESOURCE_UPDATE for the enum would help 
clarify that its a forced admin update. 
Ok. Will update it.

bq. Why not update the total capability here also (like we do for non-running 
node). When the node reports back as healthy then we would probably need the 
new resource value, right?
For node that unusable (unhealthy, LOST or decommissioned), I think it may be 
simpler to just log and warn rather than do any valid change. Or user may get 
confused that the node is still usable. Thoughts? 

bq. Why are we doing this indirect subtraction via delta instead of simply 
clusterResource-=old; clusterResource+=new. Its the same number of operations 
and less confusing to read.
Good point. Will update it.

bq. I think its crucial to have a more complete test (maybe using mockRM) that 
verifies the flow from admin service to the scheduler. Most interesting would 
be the case when the node is full allocated and then an update reduces the 
capacity. Thus resulting in -ve value of available resource on the node. I am 
wary that this case may have bugs in handling the -ve value in existing 
scheduler code because its unexpected. Its fine for the test to use the default 
scheduler.
Agree. Although I am pretty sure it works fine so far from my offline 
integration test, we have to add unit test to cover resource over-commitment 
case so any changes in future won't break these assumptions. 

Will update patch soon.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1732) Change types of related entities and primary filters in ATSEntity

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1732:
-

Attachment: YARN-1732.1.patch

 Change types of related entities and primary filters in ATSEntity
 -

 Key: YARN-1732
 URL: https://issues.apache.org/jira/browse/YARN-1732
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1732.1.patch


 The current types MapString, ListString relatedEntities and MapString, 
 Object primaryFilters have issues.  The ListString value of the related 
 entities map could have multiple identical strings in it, which doesn't make 
 sense. A more major issue is that we cannot allow primary filter values to be 
 overwritten, because otherwise we will be unable to find those primary filter 
 entries when we want to delete an entity (without doing a nearly full scan).
 I propose changing related entities to MapString, SetString and primary 
 filters to MapString, SetObject.  The basic methods to add primary 
 filters and related entities are of the form add(key, value) and will not 
 need to change.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1729) ATSWebServices always passes primary and secondary filters as strings

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1729:
-

Attachment: YARN-1729.1.patch

 ATSWebServices always passes primary and secondary filters as strings
 -

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1729.1.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1717:
-

Summary: Enable offline deletion of entries in leveldb timeline store  
(was: Misc improvements to leveldb timeline store)

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch, YARN-1717.5.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * braces for all control flow statements
 * simple locking to prevent issues related to concurrent writes
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1717:
-

Description: 
The leveldb timeline store implementation needs the following:
* better documentation of its internal structures
* internal changes to enable deleting entities
** never overwrite existing primary filter entries
** add hidden reverse pointers to related entities

  was:
The leveldb timeline store implementation needs the following:
* better documentation of its internal structures
* braces for all control flow statements
* simple locking to prevent issues related to concurrent writes
* internal changes to enable deleting entities
** never overwrite existing primary filter entries
** add hidden reverse pointers to related entities


 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch, YARN-1717.5.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-02-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1717:
-

Attachment: YARN-1717.6.patch

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, 
 YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901093#comment-13901093
 ] 

Hadoop QA commented on YARN-1590:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12626945/YARN-1590.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3099//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3099//console

This message is automatically generated.

 _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
 -

 Key: YARN-1590
 URL: https://issues.apache.org/jira/browse/YARN-1590
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, 
 YARN-1590.4.patch


 _HOST is not properly substituted when we use VIP address. Currently it 
 always used the host name of the machine and disregard the VIP address. It is 
 true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is 
 working fine for webservice authentication.
 On the other hand, the same thing is working fine for NN and SNN in RPC as 
 well as webservice.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1732) Change types of related entities and primary filters in ATSEntity

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901119#comment-13901119
 ] 

Hadoop QA commented on YARN-1732:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628944/YARN-1732.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3102//console

This message is automatically generated.

 Change types of related entities and primary filters in ATSEntity
 -

 Key: YARN-1732
 URL: https://issues.apache.org/jira/browse/YARN-1732
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1732.1.patch


 The current types MapString, ListString relatedEntities and MapString, 
 Object primaryFilters have issues.  The ListString value of the related 
 entities map could have multiple identical strings in it, which doesn't make 
 sense. A more major issue is that we cannot allow primary filter values to be 
 overwritten, because otherwise we will be unable to find those primary filter 
 entries when we want to delete an entity (without doing a nearly full scan).
 I propose changing related entities to MapString, SetString and primary 
 filters to MapString, SetObject.  The basic methods to add primary 
 filters and related entities are of the form add(key, value) and will not 
 need to change.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901123#comment-13901123
 ] 

Hadoop QA commented on YARN-1730:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628926/YARN-1730.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3100//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3100//console

This message is automatically generated.

 Leveldb timeline store needs simple write locking
 -

 Key: YARN-1730
 URL: https://issues.apache.org/jira/browse/YARN-1730
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1730.1.patch


 The actual data writes are performed atomically in a batch, but a lock should 
 be held while identifying a start time for the entity, which precedes every 
 write.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901137#comment-13901137
 ] 

Hadoop QA commented on YARN-1515:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628941/YARN-1515.v03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3101//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3101//console

This message is automatically generated.

 Ability to dump the container threads and stop the containers in a single RPC
 -

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1666:


Attachment: YARN-1666.5.patch

 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover

2014-02-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901142#comment-13901142
 ] 

Xuan Gong commented on YARN-1666:
-

bq. It doesn't seem to apply against latest trunk anymore. Please update.

updated

bq. Fix the file-name constants in YarnConfiguration to be consistent.

change YARN_SITE_XML_FILE to YARN_SITE_CONFIGURATION_FILE, and 
YARN_DEFAULT_XML_FILE to YARN_DEFAULT_CONFIGURATION_FILE.

bq. FileSystemBasedConfigurationProvider.getConfiguration(): Let's always throw 
exceptions instead of returning nulls in some cases.

Added.

bq. LocalConfigurationProvider: We need to first find the location of the XML 
file in the classpath if it is one of RM_CONFIGURATION_FILES, right?

changed

bq. In AdminService, where you use new Configuration(), should you use new 
Configuration(false)?

It is fine. Because we will load related configuration later by using 
addResource(Configuration) which will reload and overwrite all the properties.

bq. I think we can simply get rid of the LocalConfigurationProvider instance 
checks everywhere now.

Yes, we can do that.

bq. inFile - includesFile and exFile - excludesFile

DONE

bq. Make both the above as class-fields and use it in refreshNodes method too.

DONE

bq. disableHostsFileReader() should also use the remote-conf provider?

Yes, changed.

bq. HostsFileReader: Change the constructor to not need both the file-names as 
well as the streams

I think that we still need it. In the constructor, we will call refresh API 
which will print out which hosts will be included or excluded from which 
include file or exclude file. In that case, we need give fileNames.

Also, I make several other changes:
*  Move logic that loads CapacityScheduler.xml from AdminService#refreshQueues 
to CapacityScheduler#reinitiate(). 
* Add empty yarn-site.xml and hadoop-policy.xml under 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources
 to make tests pass. Because right now, LocalConfigurationProvider will load 
those configuration file from classPath now.
* At NodeListManager, we will check whether filename is empty or null. If it 
is, we will give a null as InputStream. Because both LocalConfigurationProvider 
or FSBasedConfigurationProvider will throw an exception when they are trying to 
getInputStream by giving a empty or null fileName. But at NodeListManager, it 
is allowed to give such value for fileName, it will simply disabled the 
HostsFileReader. So, before we actually create the inputstream, we'd better to 
do such checks.


 Make admin refreshNodes work across RM failover
 ---

 Key: YARN-1666
 URL: https://issues.apache.org/jira/browse/YARN-1666
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, 
 YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2014-02-13 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-1515:


Attachment: YARN-1515.v04.patch

Need to escape the inner class name in test script because it contains '$'.

 Ability to dump the container threads and stop the containers in a single RPC
 -

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)