date:20140410


[ 
https://issues.apache.org/jira/browse/YARN-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965240#comment-13965240
 ] 

Hudson commented on YARN-1907:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #535 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/535/])
YARN-1907. TestRMApplicationHistoryWriter#testRMWritingMassiveHistory 
intermittently fails. Contributed by Mit Desai. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585992)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java


 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails
 -

 Key: YARN-1907
 URL: https://issues.apache.org/jira/browse/YARN-1907
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6195.patch


 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1910) TestAMRMTokens fails on windows


[ 
https://issues.apache.org/jira/browse/YARN-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965236#comment-13965236
 ] 

Hudson commented on YARN-1910:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #535 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/535/])
YARN-1910. Fixed a race condition in TestAMRMTokens that causes the test to 
fail more often on Windows. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586192)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 TestAMRMTokens fails on windows
 ---

 Key: YARN-1910
 URL: https://issues.apache.org/jira/browse/YARN-1910
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1910.1.patch, YARN-1910.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1677) Potential bugs in exception handlers

2014-04-10 Thread Ding Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965253#comment-13965253
 ] 

Ding Yuan commented on YARN-1677:
-

Ping. Is there anything else I can help from my side?

 Potential bugs in exception handlers
 

 Key: YARN-1677
 URL: https://issues.apache.org/jira/browse/YARN-1677
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Ding Yuan
 Attachments: yarn-1677.patch


 Hi Yarn developers,
 We are a group of researchers on software reliability, and recently we did a 
 study and found that majority of the most severe failures in hadoop are 
 caused by bugs in exception handling logic. Therefore we built a simple 
 checking tool that automatically detects some bug patterns that have caused 
 some very severe failures. I am reporting some of the results for Yarn here. 
 Any feedback is much appreciated!
 ==
 Case 1:
 Line: 551, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
 {noformat}
 switch (monitoringEvent.getType()) {
 case START_MONITORING_CONTAINER:
   .. ..
 default:
   // TODO: Wrong event.
 }
 {noformat}
 The switch fall-through (handling any potential unexpected event) is empty. 
 Should we at least print an error message here?
 ==
 ==
 Case 2:
   Line: 491, File: 
 org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
 {noformat}
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 {noformat}
 The handler of this very general exception only logs the error. The TODO 
 seems to indicate it is not sufficient.
 ==
 ==
 Case 3:
 Line: 861, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
for (LocalResourceStatus stat : remoteResourceStatuses) {
 LocalResource rsrc = stat.getResource();
 LocalResourceRequest req = null;
 try {
   req = new LocalResourceRequest(rsrc);
 } catch (URISyntaxException e) {
   // TODO fail? Already translated several times...
 }
 The handler for URISyntaxException is empty, and the TODO seems to indicate 
 it is not sufficient.
 The same code pattern can also be found at:
 Line: 901, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 Line: 838, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 Line: 878, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 At line: 803, File: 
 org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, 
 the handler of URISyntaxException also seems not sufficient:
 {noformat}
try {
   shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(
   shellScriptPath)));
 } catch (URISyntaxException e) {
   LOG.error(Error when trying to use shell script path specified
   +  in env, path= + shellScriptPath);
   e.printStackTrace();
   // A failure scenario on bad input such as invalid shell script path
   // We know we cannot continue launching the container
   // so we should release it.
   // TODO
   numCompletedContainers.incrementAndGet();
   numFailedContainers.incrementAndGet();
   return;
 }
 {noformat}
 ==
 ==
 Case 4:
 Line: 627, File: 
 org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
 {noformat}
   try {
 /* keep the master in sync with the state machine */
 this.stateMachine.doTransition(event.getType(), event);
   } catch (InvalidStateTransitonException e) {
 LOG.error(Can't handle this event at current state, e);
 /* TODO fail the application on the failed transition */
   }
 {noformat}
 The handler of this exception only logs the error. The TODO seems to indicate 
 it is not sufficient.
 This exact same code pattern can also be found at:
 Line: 573, File: 
 org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
 ==
 ==
 Case 5: empty handler for exception: java.lang.InterruptedException
   Line: 123, File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java
 {noformat}
   public void join() {
 if(proxyServer != null) {
   try

[jira] [Updated] (YARN-322) Add cpu information to queue metrics


 [ 
https://issues.apache.org/jira/browse/YARN-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-322:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Add cpu information to queue metrics
 

 Key: YARN-322
 URL: https://issues.apache.org/jira/browse/YARN-322
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.5.0


 Post YARN-2 we need to add cpu information to queue metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command


 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1334:


Fix Version/s: (was: 2.4.0)
   2.5.0

 YARN should give more info on errors when running failed distributed shell 
 command
 --

 Key: YARN-1334
 URL: https://issues.apache.org/jira/browse/YARN-1334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-1334.1.patch


 Run incorrect command such as:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributedshell jar -shell_command ./test1.sh -shell_script ./
 would show shell exit code exception with no useful message. It should print 
 out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-650) User guide for preemption


 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-650:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.5.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


 [ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1514:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.5.0


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1722) AMRMProtocol should have a way of getting all the nodes in the cluster


 [ 
https://issues.apache.org/jira/browse/YARN-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1722:


Fix Version/s: (was: 2.4.0)
   2.5.0

 AMRMProtocol should have a way of getting all the nodes in the cluster
 --

 Key: YARN-1722
 URL: https://issues.apache.org/jira/browse/YARN-1722
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Bikas Saha
 Fix For: 2.5.0


 There is no way for an AM to find out the names of all the nodes in the 
 cluster via the AMRMProtocol. An AM can only at best ask for containers at * 
 location. The only way to get that information is via the ClientRMProtocol 
 but that is secured by Kerberos or RMDelegationToken while the AM has an 
 AMRMToken. This is a pretty important piece of missing functionality. There 
 are other jiras opened about getting cluster topology etc. but they havent 
 been addressed due to a clear definition of cluster topology perhaps. Adding 
 a means to at least get the node information would be a good first step.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS


 [ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-153:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Fix For: 2.5.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster


 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1234:


Fix Version/s: (was: 2.4.0)
   2.5.0

  Container localizer logs are not created in secured cluster
 

 Key: YARN-1234
 URL: https://issues.apache.org/jira/browse/YARN-1234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.5.0


 When we are running ContainerLocalizer in secured cluster we potentially are 
 not creating any log file to track log messages. This will be helpful in 
 potentially identifying ContainerLocalization issues in secured cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location


 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-314:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.5.0


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections


 [ 
https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-113:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 WebAppProxyServlet must use SSLFactory for the HttpClient connections
 -

 Key: YARN-113
 URL: https://issues.apache.org/jira/browse/YARN-113
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.5.0


 The HttpClient must be configured to use the SSLFactory when the web UIs are 
 over HTTPS, otherwise the proxy servlet fails to connect to the AM because of 
 unknown (self-signed) certificates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality


 [ 
https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1723:


Fix Version/s: (was: 2.4.0)
   2.5.0

 AMRMClientAsync missing blacklist addition and removal functionality
 

 Key: YARN-1723
 URL: https://issues.apache.org/jira/browse/YARN-1723
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Bikas Saha
 Fix For: 2.5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1477) No Submit time on AM web pages


 [ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1477:


Fix Version/s: (was: 2.4.0)
   2.5.0

 No Submit time on AM web pages
 --

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.5.0


 Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to 
 the AM's web services REST API



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1147) Add end-to-end tests for HA


 [ 
https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1147:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Add end-to-end tests for HA
 ---

 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.5.0


 While individual sub-tasks add tests for the code they include, it will be 
 handy to write end-to-end tests for HA including some stress testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values


 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1156:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.5.0

 Attachments: YARN-1156.1.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed


 [ 
https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-965:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 NodeManager Metrics containersRunning is not correct When localizing 
 container process is failed or killed
 --

 Key: YARN-965
 URL: https://issues.apache.org/jira/browse/YARN-965
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
 Environment: suse linux
Reporter: Li Yuan
 Fix For: 2.5.0


 When successfully launched a container, container state from LOCALIZED to 
 RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or 
 KILLING to DONE, containersRunning--. 
 However, state EXITED_WITH_FAILURE or KILLING could come from 
 LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less 
 than the actual number. Further more,　Metrics is wrong, containersLaunched != 
 containersCompleted + containersFailed +　containersKilled ＋ containersRunning 
 +　containersIniting



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly


 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1142:


Fix Version/s: (was: 2.4.0)
   2.5.0

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.5.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-308:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Improve documentation about what asks means in AMRMProtocol
 -

 Key: YARN-308
 URL: https://issues.apache.org/jira/browse/YARN-308
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, documentation, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.5.0

 Attachments: YARN-308.patch


 It's unclear to me from reading the javadoc exactly what asks means when 
 the AM sends a heartbeat to the RM.  Is the AM supposed to send a list of all 
 resources that it is waiting for?  Or just inform the RM about new ones that 
 it wants?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1


 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-614:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Chris Riccomini
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts


 [ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1621:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Add CLI to list states of yarn container-IDs/hosts
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
 Fix For: 2.5.0


 As more applications are moved to YARN, we need generic CLI to list states of 
 yarn containers and their hosts. Today if YARN application running in a 
 container does hang, there is no way other than to manually kill its process.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers. 
 {code:title=proposed yarn cli}
 $ yarn application -list-containers appId status
 where status is one of running/succeeded/killed/failed/all
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9


 [ 
https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1327:


Fix Version/s: (was: 2.4.0)
   2.5.0

 Fix nodemgr native compilation problems on FreeBSD9
 ---

 Key: YARN-1327
 URL: https://issues.apache.org/jira/browse/YARN-1327
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Radim Kolar
Assignee: Radim Kolar
 Fix For: 3.0.0, 2.5.0

 Attachments: nodemgr-portability.txt


 There are several portability problems preventing from compiling native 
 component on freebsd.
 1. libgen.h is not included. correct function prototype is there but linux 
 glibc has workaround to define it for user if libgen.h is not directly 
 included. Include this file directly.
 2. query max size of login name using sysconf. it follows same code style 
 like rest of code using sysconf too.
 3. cgroups are linux only feature, make conditional compile and return error 
 if mount_cgroup is attempted on non linux OS
 4. do not use posix function setpgrp() since it clashes with same function 
 from BSD 4.2, use equivalent function. After inspecting glibc sources its 
 just shortcut to setpgid(0,0)
 These changes makes it compile on both linux and freebsd.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-160:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
 Fix For: 2.5.0


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package


 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-745:
---

Fix Version/s: (was: 2.4.0)
   2.5.0

 Move UnmanagedAMLauncher to yarn client package
 ---

 Key: YARN-745
 URL: https://issues.apache.org/jira/browse/YARN-745
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 2.5.0


 Its currently sitting in yarn applications project which sounds wrong. client 
 project sounds better since it contains the utilities/libraries that clients 
 use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-996) REST API support for node resource configuration

2014-04-10 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965331#comment-13965331
 ] 

Thomas Graves commented on YARN-996:


Have you tested this with admin acls?  Taking a quick look at the code I don't 
see that the updateNodeResource is being properly protected.  I guess that is a 
separate jira though since its already in there.

 REST API support for node resource configuration
 

 Key: YARN-996
 URL: https://issues.apache.org/jira/browse/YARN-996
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Kenji Kikushima
 Attachments: YARN-996-sample.patch


 Besides admin protocol and CLI, REST API should also be supported for node 
 resource configuration



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1910) TestAMRMTokens fails on windows


[ 
https://issues.apache.org/jira/browse/YARN-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965341#comment-13965341
 ] 

Hudson commented on YARN-1910:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1753/])
YARN-1910. Fixed a race condition in TestAMRMTokens that causes the test to 
fail more often on Windows. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586192)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 TestAMRMTokens fails on windows
 ---

 Key: YARN-1910
 URL: https://issues.apache.org/jira/browse/YARN-1910
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1910.1.patch, YARN-1910.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1477) No Submit time on AM web pages


[ 
https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965360#comment-13965360
 ] 

Chen He commented on YARN-1477:
---

I am working on it. Thank you for remindering.

 No Submit time on AM web pages
 --

 Key: YARN-1477
 URL: https://issues.apache.org/jira/browse/YARN-1477
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chen He
Assignee: Chen He
  Labels: features
 Fix For: 2.5.0


 Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to 
 the AM's web services REST API



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2

2014-04-10 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1906:


Attachment: YARN-1906.patch

Attaching patch for trunk and branch-2

 TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and 
 branch2
 ---

 Key: YARN-1906
 URL: https://issues.apache.org/jira/browse/YARN-1906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1906.patch, YARN-1906.patch


 Here is the output of the format
 {noformat}
 testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 9.757 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

[
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chen He reassigned YARN-1857:
-

Assignee: Chen He

CapacityScheduler headroom doesn't account for other AM's running
-

Key: YARN-1857
URL: https://issues.apache.org/jira/browse/YARN-1857
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He

Its possible to get an application to hang forever (or a long time) in a
cluster with multiple users. The reason why is that the headroom sent to the
application is based on the user limit but it doesn't account for other
Application masters using space in that queue. So the headroom (user limit -
user consumed) can be 0 even though the cluster is 100% full because the
other space is being used by application masters from other users.
For instance if you have a cluster with 1 queue, user limit is 100%, you have
multiple users submitting applications. One very large application by user 1
starts up, runs most of its maps and starts running reducers. other users try
to start applications and get their application masters started but not
tasks. The very large application then gets to the point where it has
consumed the rest of the cluster resources with all reduces. But at this
point it needs to still finish a few maps. The headroom being sent to this
application is only based on the user limit (which is 100% of the cluster
capacity) its using lets say 95% of the cluster for reduces and then other 5%
is being used by other users running application masters. The MRAppMaster
thinks it still has 5% so it doesn't know that it should kill a reduce in
order to run a map.
This can happen in other scenarios also. Generally in a large cluster with
multiple queues this shouldn't cause a hang forever but it could cause the
application to take much longer.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2

2014-04-10 Thread Mit Desai (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965449#comment-13965449
]

Mit Desai commented on YARN-1906:
-

*Explanation of the changes made*
# Two assert statements were removed from the test that were verifying the
pending application count increased from 0 to 1. This is an intermediate result
(which is good to test if consistent). In this case, the intermediate results
are inconsistent as the application transition to pending state can be or
cannot be detected when the assert is called. The aim of the test is to check
the queueMetrics value before and after restart. And this is working as
expected without the assert for pendingApps. I have tested the patch by running
the test 25 times and it passes.
# The assertQueueMetrics function was not properly implemented. assertEquals()
takes in 2 parameters 1. expected value, 2. Actual value. All the asserts in
the assertQueueMetrics were implemented in the opposite way leading to a wrong
error message on an assert failure.

TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and
branch2
---

Key: YARN-1906
URL: https://issues.apache.org/jira/browse/YARN-1906
Project: Hadoop YARN
Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
Fix For: 3.0.0, 2.5.0

Attachments: YARN-1906.patch, YARN-1906.patch

Here is the output of the format
{noformat}
testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
Time elapsed: 9.757 sec FAILURE!
java.lang.AssertionError: expected:2 but was:1
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
at
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
{noformat}

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2


[ 
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965492#comment-13965492
 ] 

Zhijie Shen commented on YARN-1906:
---

1. I looked into exception again:
{code}
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
{code}
It seems the test fails at
{code}
// finish the AMs
finishApplicationMaster(loadedApp1, rm2, nm1, am1);
assertQueueMetrics(qm2, 1, 0, 0, 1);
{code}
Race condition here? Should we waitForState here before assertion?

2. One suggestion on assertQueueMetrics: It would be better to add the messages 
for the for assertion sentences, such that when an exception happens, we can 
easily see which metric is wrong.

 TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and 
 branch2
 ---

 Key: YARN-1906
 URL: https://issues.apache.org/jira/browse/YARN-1906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1906.patch, YARN-1906.patch


 Here is the output of the format
 {noformat}
 testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 9.757 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1921) Allow to override queue prefix, where new queues created

Andrey Stepachev created YARN-1921:
--

 Summary: Allow to override queue prefix, where new queues created
 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev


Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
queues, if they not exists with hardcoded prefix root.. 

Consider an example: we have a placement rule, which creates user's queue if it 
not exists. Current implementation creates it at root. prefix  Suppose that 
this user runs a big job. In that case it will get a fair share of resources 
because queue will be created at 'root.' with default settings, and that 
affects all other users of the cluster. 
Of course, FairScheduler can place such users to default queue, but in that 
case if user submits a big queue it will eats resources of whole queue, and we 
know that no preemption can be done within one queue (Or i'm wrong?). So 
effectively one user can usurp all default queue resources.

To solve that I created a patch, which allows to override root. prefix in 
QueuePlacementRules. Thats gives us flexibility to automatically create queues 
for users or group of users under predefined queue. So, every user will get a 
separate queue and will share parent queue resources and can't usurp all 
resources, because parent node can be configured to preempt tasks.

Consider example (parent queue specified for each rule):
{code:title=policy.xml|borderStyle=solid}
queuePlacementPolicy
  rule name='specified' parent='granted'/
  rule name='user'  parent='guests'/
/queuePlacementPolicy
{code}

With such definition queue requirements will give us:
{code:title=Example.java|borderStyle=solid}
root.granted.specifiedq == policy.assignAppToQueue(specifiedq, someuser);
root.guests.someuser == policy.assignAppToQueue(default, someuser);
root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2


[ 
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965500#comment-13965500
 ] 

Hadoop QA commented on YARN-1906:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639589/YARN-1906.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3542//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3542//console

This message is automatically generated.

 TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and 
 branch2
 ---

 Key: YARN-1906
 URL: https://issues.apache.org/jira/browse/YARN-1906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1906.patch, YARN-1906.patch


 Here is the output of the format
 {noformat}
 testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 9.757 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1921) Allow to override queue prefix, where new queues created


 [ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1921:
---

Attachment: YARN-1921.patch

 Allow to override queue prefix, where new queues created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eats resources of whole queue, and 
 we know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1910) TestAMRMTokens fails on windows


 [ 
https://issues.apache.org/jira/browse/YARN-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1910:
--

Fix Version/s: (was: 2.4.0)
   2.4.1

 TestAMRMTokens fails on windows
 ---

 Key: YARN-1910
 URL: https://issues.apache.org/jira/browse/YARN-1910
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1910.1.patch, YARN-1910.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2


 [ 
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1906:
--

 Target Version/s: 2.4.1
Affects Version/s: 2.4.0
Fix Version/s: (was: 2.5.0)
   (was: 3.0.0)

 TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and 
 branch2
 ---

 Key: YARN-1906
 URL: https://issues.apache.org/jira/browse/YARN-1906
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1906.patch, YARN-1906.patch


 Here is the output of the format
 {noformat}
 testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 9.757 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

[
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965515#comment-13965515
]

Arun C Murthy commented on YARN-1769:
-

Sorry guys, been slammed. I'll take a look at this presently. Tx.

CapacityScheduler: Improve reservations

Key: YARN-1769
URL: https://issues.apache.org/jira/browse/YARN-1769
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch,
YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch,
YARN-1769.patch

Currently the CapacityScheduler uses reservations in order to handle requests
for large containers and the fact there might not currently be enough space
available on a single host.
The current algorithm for reservations is to reserve as many containers as
currently required and then it will start to reserve more above that after a
certain number of re-reservations (currently biased against larger
containers). Anytime it hits the limit of number reserved it stops looking
at any other nodes. This results in potentially missing nodes that have
enough space to fullfill the request.
The other place for improvement is currently reservations count against your
queue capacity. If you have reservations you could hit the various limits
which would then stop you from looking further at that node.
The above 2 cases can cause an application requesting a larger container to
take a long time to gets it resources.
We could improve upon both of those by simply continuing to look at incoming
nodes to see if we could potentially swap out a reservation for an actual
allocation.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows

2014-04-10 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965546#comment-13965546
 ] 

Vinod Kumar Vavilapalli commented on YARN-1920:
---

Putting more context: The test-failure is  happening because the test before 
the failing test wasn't deleting the file successfully due to the file-handle 
leak. In all the tests in this test-case, we always expect that a new 
history-store file to be created for each of the tests and due to the leak, 
that assumption was violated. The leak was leaving the stream open and so the 
file couldn't be deleted, though it works fine on Linux/Mac.


 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1677) Potential bugs in exception handlers

2014-04-10 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1396#comment-1396
 ] 

Devaraj K commented on YARN-1677:
-

Thanks Ding for taking up these, Appreciate your work.

Could you split these into multiple Jira's instead of grouping into single 
Jira. 
And also can you add the tests for the changes, please refer 
http://wiki.apache.org/hadoop/HowToContribute#Making_Changes.

After attaching the patch for the issue, please click on the 'Submit Patch' 
button so that Jenkins can run the patch and also any one can review it.  


 Potential bugs in exception handlers
 

 Key: YARN-1677
 URL: https://issues.apache.org/jira/browse/YARN-1677
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Ding Yuan
 Attachments: yarn-1677.patch


 Hi Yarn developers,
 We are a group of researchers on software reliability, and recently we did a 
 study and found that majority of the most severe failures in hadoop are 
 caused by bugs in exception handling logic. Therefore we built a simple 
 checking tool that automatically detects some bug patterns that have caused 
 some very severe failures. I am reporting some of the results for Yarn here. 
 Any feedback is much appreciated!
 ==
 Case 1:
 Line: 551, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
 {noformat}
 switch (monitoringEvent.getType()) {
 case START_MONITORING_CONTAINER:
   .. ..
 default:
   // TODO: Wrong event.
 }
 {noformat}
 The switch fall-through (handling any potential unexpected event) is empty. 
 Should we at least print an error message here?
 ==
 ==
 Case 2:
   Line: 491, File: 
 org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
 {noformat}
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 {noformat}
 The handler of this very general exception only logs the error. The TODO 
 seems to indicate it is not sufficient.
 ==
 ==
 Case 3:
 Line: 861, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
for (LocalResourceStatus stat : remoteResourceStatuses) {
 LocalResource rsrc = stat.getResource();
 LocalResourceRequest req = null;
 try {
   req = new LocalResourceRequest(rsrc);
 } catch (URISyntaxException e) {
   // TODO fail? Already translated several times...
 }
 The handler for URISyntaxException is empty, and the TODO seems to indicate 
 it is not sufficient.
 The same code pattern can also be found at:
 Line: 901, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 Line: 838, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 Line: 878, File: 
 org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
 At line: 803, File: 
 org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, 
 the handler of URISyntaxException also seems not sufficient:
 {noformat}
try {
   shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(
   shellScriptPath)));
 } catch (URISyntaxException e) {
   LOG.error(Error when trying to use shell script path specified
   +  in env, path= + shellScriptPath);
   e.printStackTrace();
   // A failure scenario on bad input such as invalid shell script path
   // We know we cannot continue launching the container
   // so we should release it.
   // TODO
   numCompletedContainers.incrementAndGet();
   numFailedContainers.incrementAndGet();
   return;
 }
 {noformat}
 ==
 ==
 Case 4:
 Line: 627, File: 
 org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
 {noformat}
   try {
 /* keep the master in sync with the state machine */
 this.stateMachine.doTransition(event.getType(), event);
   } catch (InvalidStateTransitonException e) {
 LOG.error(Can't handle this event at current state, e);
 /* TODO fail the application on the failed transition */
   }
 {noformat}
 The handler of this exception only logs the error. The TODO seems to indicate 
 it is not sufficient.
 This exact same code pattern can also be found at:

[jira] [Created] (YARN-1922) Process group remains alive after container process is killed externally

2014-04-10 Thread Billie Rinaldi (JIRA)

Billie Rinaldi created YARN-1922:


 Summary: Process group remains alive after container process is 
killed externally
 Key: YARN-1922
 URL: https://issues.apache.org/jira/browse/YARN-1922
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: CentOS 6.4
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


If the main container process is killed externally, ContainerLaunch does not 
kill the rest of the process group.  Before sending the event that results in 
the ContainerLaunch.containerCleanup method being called, ContainerLaunch sets 
the completed flag to true.  Then when cleaning up, it doesn't try to read 
the pid file if the completed flag is true.  If it read the pid file, it would 
proceed to send the container a kill signal.  In the case of the 
DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows


 [ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1920:
--

Attachment: YARN-1920.2.patch

 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1920.2.patch, YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows


[ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965571#comment-13965571
 ] 

Zhijie Shen commented on YARN-1920:
---

The reason of the file sounds right to me. The patch looks good as well. I made 
a minor change to the patch to make each LOG.error to record the exception 
instance. Will commit it once Jenkins +1

 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1920.2.patch, YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1922) Process group remains alive after container process is killed externally

2014-04-10 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1922:
-

Attachment: YARN-1922.1.patch

 Process group remains alive after container process is killed externally
 

 Key: YARN-1922
 URL: https://issues.apache.org/jira/browse/YARN-1922
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: CentOS 6.4
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1922.1.patch


 If the main container process is killed externally, ContainerLaunch does not 
 kill the rest of the process group.  Before sending the event that results in 
 the ContainerLaunch.containerCleanup method being called, ContainerLaunch 
 sets the completed flag to true.  Then when cleaning up, it doesn't try to 
 read the pid file if the completed flag is true.  If it read the pid file, it 
 would proceed to send the container a kill signal.  In the case of the 
 DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-862) ResourceManager and NodeManager versions should match on node registration or error out


[ 
https://issues.apache.org/jira/browse/YARN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965591#comment-13965591
 ] 

Chen He commented on YARN-862:
--

Since the rolling upgrade is checked in to Hadoop and YARN-819 is resolved. I 
will close this one. 

 ResourceManager and NodeManager versions should match on node registration or 
 error out
 ---

 Key: YARN-862
 URL: https://issues.apache.org/jira/browse/YARN-862
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 0.23.8
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: YARN-862-b0.23-v1.patch, YARN-862-b0.23-v2.patch


 For branch-0.23 the versions of the node manager and the resource manager 
 should match to complete a successful registration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-862) ResourceManager and NodeManager versions should match on node registration or error out


[ 
https://issues.apache.org/jira/browse/YARN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965593#comment-13965593
 ] 

Chen He commented on YARN-862:
--

Thank you for the patch, [~reparker].

 ResourceManager and NodeManager versions should match on node registration or 
 error out
 ---

 Key: YARN-862
 URL: https://issues.apache.org/jira/browse/YARN-862
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 0.23.8
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: YARN-862-b0.23-v1.patch, YARN-862-b0.23-v2.patch


 For branch-0.23 the versions of the node manager and the resource manager 
 should match to complete a successful registration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows


[ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965616#comment-13965616
 ] 

Hadoop QA commented on YARN-1920:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639612/YARN-1920.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3543//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3543//console

This message is automatically generated.

 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1920.2.patch, YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1914) Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows

2014-04-10 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1914:
--

Attachment: apache-yarn-1914.2.patch

Same patch as before but the code comment matching branch-1.

Will check this in when Jenkins says okay..

 Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows
 

 Key: YARN-1914
 URL: https://issues.apache.org/jira/browse/YARN-1914
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1914.0.patch, apache-yarn-1914.1.patch, 
 apache-yarn-1914.2.patch


 The TestFSDownload.testDownloadPublicWithStatCache test in hadoop-yarn-common 
 consistently fails on Windows environments.
 The root cause is that the test checks for execute permission for all users 
 on every ancestor of the target directory. In windows, by default, group 
 Everyone has no permissions on any directory in the install drive. It's 
 unreasonable to expect this test to pass and we should skip it on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster

Anubhav Dhoot created YARN-1923:
---

 Summary: Make FairScheduler resource ratio calculations terminate 
faster
 Key: YARN-1923
 URL: https://issues.apache.org/jira/browse/YARN-1923
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


In fair scheduler computing shares continues till iterations are complete even 
when we have a perfect match between the resource shares and total resources. 
This is because the binary search checks only less or greater and not equals. 
Add an early termination condition when its equal



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1921) Allow to override queue prefix, where new queues created

2014-04-10 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965695#comment-13965695
 ] 

Sandy Ryza commented on YARN-1921:
--

Hi [~octo47], thanks for the patch, this is a feature I think we definitely 
need.  Similar work has already been proposed on YARN-1864.  Mind chiming in on 
the discussion over there?

 Allow to override queue prefix, where new queues created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eats resources of whole queue, and 
 we know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1914) Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows


[ 
https://issues.apache.org/jira/browse/YARN-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965697#comment-13965697
 ] 

Hadoop QA commented on YARN-1914:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12639618/apache-yarn-1914.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3544//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3544//console

This message is automatically generated.

 Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows
 

 Key: YARN-1914
 URL: https://issues.apache.org/jira/browse/YARN-1914
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1914.0.patch, apache-yarn-1914.1.patch, 
 apache-yarn-1914.2.patch


 The TestFSDownload.testDownloadPublicWithStatCache test in hadoop-yarn-common 
 consistently fails on Windows environments.
 The root cause is that the test checks for execute permission for all users 
 on every ancestor of the target directory. In windows, by default, group 
 Everyone has no permissions on any directory in the install drive. It's 
 unreasonable to expect this test to pass and we should skip it on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows


[ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965700#comment-13965700
 ] 

Zhijie Shen commented on YARN-1920:
---

Committed to trunk, branch-2 and branch-2.4. Thanks, Vinod!

 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
  Labels: test, windows
 Fix For: 2.4.1

 Attachments: YARN-1920.2.patch, YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows


 [ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1920:
--

Labels: test windows  (was: )

 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
  Labels: test, windows
 Fix For: 2.4.1

 Attachments: YARN-1920.2.patch, YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

[
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965701#comment-13965701
]

Andrey Stepachev commented on YARN-1864:

I have a different solution.
My patch https://issues.apache.org/jira/browse/YARN-1921 smaller and simplier
allows to override queue prefix.
In that case it is possible to place users in different queues for different
groups.

Fair Scheduler Dynamic Hierarchical User Queues
---

Key: YARN-1864
URL: https://issues.apache.org/jira/browse/YARN-1864
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Reporter: Ashwin Shankar
Labels: scheduler
Attachments: YARN-1864-v1.txt

In Fair Scheduler, we want to be able to create user queues under any parent
queue in the hierarchy. For eg. Say user1 submits a job to a parent queue
called root.allUserQueues, we want be able to create a new queue called
root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted
by this user to root.allUserQueues will be run in this newly created
root.allUserQueues.user1.
This is very similar to the 'user-as-default' feature in Fair Scheduler which
creates user queues under root queue. But we want the ability to create user
queues under ANY parent queue.
Why do we want this ?
1. Preemption : these dynamically created user queues can preempt each other
if its fair share is not met. So there is fairness among users.
User queues can also preempt other non-user leaf queue as well if below fair
share.
2. Allocation to user queues : we want all the user queries(adhoc) to consume
only a fraction of resources in the shared cluster. By creating this
feature,we could do that by giving a fair share to the parent user queue
which is then redistributed to all the dynamically created user queues.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1910) TestAMRMTokens fails on windows


[ 
https://issues.apache.org/jira/browse/YARN-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965713#comment-13965713
 ] 

Hudson commented on YARN-1910:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1728 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1728/])
YARN-1910. Fixed a race condition in TestAMRMTokens that causes the test to 
fail more often on Windows. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586192)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 TestAMRMTokens fails on windows
 ---

 Key: YARN-1910
 URL: https://issues.apache.org/jira/browse/YARN-1910
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1910.1.patch, YARN-1910.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1907) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails


[ 
https://issues.apache.org/jira/browse/YARN-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965717#comment-13965717
 ] 

Hudson commented on YARN-1907:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1728 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1728/])
YARN-1907. TestRMApplicationHistoryWriter#testRMWritingMassiveHistory 
intermittently fails. Contributed by Mit Desai. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1585992)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java


 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails
 -

 Key: YARN-1907
 URL: https://issues.apache.org/jira/browse/YARN-1907
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6195.patch


 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster


 [ 
https://issues.apache.org/jira/browse/YARN-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1923:


Attachment: YARN-1923.patch

 Make FairScheduler resource ratio calculations terminate faster
 ---

 Key: YARN-1923
 URL: https://issues.apache.org/jira/browse/YARN-1923
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-1923.patch


 In fair scheduler computing shares continues till iterations are complete 
 even when we have a perfect match between the resource shares and total 
 resources. This is because the binary search checks only less or greater and 
 not equals. Add an early termination condition when its equal



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1920) TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows


[ 
https://issues.apache.org/jira/browse/YARN-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965731#comment-13965731
 ] 

Hudson commented on YARN-1920:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5491 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5491/])
YARN-1920. Fixed TestFileSystemApplicationHistoryStore failure on windows. 
Contributed by Vinod Kumar Vavilapalli. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586414)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestFileSystemApplicationHistoryStore.java


 TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData
  fails in windows
 ---

 Key: YARN-1920
 URL: https://issues.apache.org/jira/browse/YARN-1920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
  Labels: test, windows
 Fix For: 2.4.1

 Attachments: YARN-1920.2.patch, YARN-1920.txt


 Though this was only failing in Windows, after debugging, I realized that the 
 test fails because we are leaking a file-handle in the history service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1921) Allow to override queue prefix, where new queues created


[ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965732#comment-13965732
 ] 

Andrey Stepachev commented on YARN-1921:


Thanks for pointing on similar solution, but solution with overriding prefix 
looks a bit cleaner and local (i.e. we can comprehend what is going on right 
from the policy definition).
And not intrusive at all (interfaces and methods signatures are the same)

 Allow to override queue prefix, where new queues created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eats resources of whole queue, and 
 we know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1921) Allow to override queue prefix, where new queues created


 [ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1921:
---

Description: 
Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
queues, if they not exists with hardcoded prefix root.. 

Consider an example: we have a placement rule, which creates user's queue if it 
not exists. Current implementation creates it at root. prefix  Suppose that 
this user runs a big job. In that case it will get a fair share of resources 
because queue will be created at 'root.' with default settings, and that 
affects all other users of the cluster. 
Of course, FairScheduler can place such users to default queue, but in that 
case if user submits a big queue it will eat resources of whole queue, and we 
know that no preemption can be done within one queue (Or i'm wrong?). So 
effectively one user can usurp all default queue resources.

To solve that I created a patch, which allows to override root. prefix in 
QueuePlacementRules. Thats gives us flexibility to automatically create queues 
for users or group of users under predefined queue. So, every user will get a 
separate queue and will share parent queue resources and can't usurp all 
resources, because parent node can be configured to preempt tasks.

Consider example (parent queue specified for each rule):
{code:title=policy.xml|borderStyle=solid}
queuePlacementPolicy
  rule name='specified' parent='granted'/
  rule name='user'  parent='guests'/
/queuePlacementPolicy
{code}

With such definition queue requirements will give us:
{code:title=Example.java|borderStyle=solid}
root.granted.specifiedq == policy.assignAppToQueue(specifiedq, someuser);
root.guests.someuser == policy.assignAppToQueue(default, someuser);
root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
{code}

  was:
Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
queues, if they not exists with hardcoded prefix root.. 

Consider an example: we have a placement rule, which creates user's queue if it 
not exists. Current implementation creates it at root. prefix  Suppose that 
this user runs a big job. In that case it will get a fair share of resources 
because queue will be created at 'root.' with default settings, and that 
affects all other users of the cluster. 
Of course, FairScheduler can place such users to default queue, but in that 
case if user submits a big queue it will eats resources of whole queue, and we 
know that no preemption can be done within one queue (Or i'm wrong?). So 
effectively one user can usurp all default queue resources.

To solve that I created a patch, which allows to override root. prefix in 
QueuePlacementRules. Thats gives us flexibility to automatically create queues 
for users or group of users under predefined queue. So, every user will get a 
separate queue and will share parent queue resources and can't usurp all 
resources, because parent node can be configured to preempt tasks.

Consider example (parent queue specified for each rule):
{code:title=policy.xml|borderStyle=solid}
queuePlacementPolicy
  rule name='specified' parent='granted'/
  rule name='user'  parent='guests'/
/queuePlacementPolicy
{code}

With such definition queue requirements will give us:
{code:title=Example.java|borderStyle=solid}
root.granted.specifiedq == policy.assignAppToQueue(specifiedq, someuser);
root.guests.someuser == policy.assignAppToQueue(default, someuser);
root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
{code}


 Allow to override queue prefix, where new queues created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to

[jira] [Updated] (YARN-1921) Allow to override queue prefix, where new queues will be created


 [ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1921:
---

Summary: Allow to override queue prefix, where new queues will be created  
(was: Allow to override queue prefix, where new queues created)

 Allow to override queue prefix, where new queues will be created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode


[ 
https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965739#comment-13965739
 ] 

Hadoop QA commented on YARN-1853:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637429/YARN-1853.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3545//console

This message is automatically generated.

 Allow containers to be ran under real user even in insecure mode
 

 Key: YARN-1853
 URL: https://issues.apache.org/jira/browse/YARN-1853
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.2.0
Reporter: Andrey Stepachev
 Attachments: YARN-1853.patch, YARN-1853.patch


 Currently unsecure cluster runs all containers under one user (typically 
 nobody). That is not appropriate, because yarn applications doesn't play well 
 with hdfs having enabled permissions. Yarn applications try to write data (as 
 expected) into /user/nobody regardless of user, who launched application.
 Another sideeffect is that it is not possible to configure cgroups for 
 particular users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT


[ 
https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965753#comment-13965753
 ] 

Chen He commented on YARN-126:
--

The patch is out of date. Hi, SAISSY, can you update your patch?

 yarn rmadmin help message contains reference to hadoop cli and JT
 -

 Key: YARN-126
 URL: https://issues.apache.org/jira/browse/YARN-126
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Rémy SAISSY
  Labels: usability
 Attachments: YARN-126.patch


 has option to specify a job tracker and the last line for general command 
 line syntax had bin/hadoop command [genericOptions] [commandOptions]
 ran yarn rmadmin to get usage:
 RMAdmin
 Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-help [cmd]]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1914) Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows


[ 
https://issues.apache.org/jira/browse/YARN-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965763#comment-13965763
 ] 

Hudson commented on YARN-1914:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5492 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5492/])
YARN-1914. Fixed resource-download on NodeManagers to skip permission 
verification of public cache files in Windows+local file-system environment. 
Contribued by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586434)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java


 Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows
 

 Key: YARN-1914
 URL: https://issues.apache.org/jira/browse/YARN-1914
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.4.1

 Attachments: apache-yarn-1914.0.patch, apache-yarn-1914.1.patch, 
 apache-yarn-1914.2.patch


 The TestFSDownload.testDownloadPublicWithStatCache test in hadoop-yarn-common 
 consistently fails on Windows environments.
 The root cause is that the test checks for execute permission for all users 
 on every ancestor of the target directory. In windows, by default, group 
 Everyone has no permissions on any directory in the install drive. It's 
 unreasonable to expect this test to pass and we should skip it on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1106) The RM should point the tracking url to the RM app page if its empty


[ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965777#comment-13965777
 ] 

Chen He commented on YARN-1106:
---

Thank you for the patch [~tgraves]. I apply the patch to trunk and get same 
test failure that [~jeagles] mentioned. Here is the failure message. 

Tests run: 62, Failures: 2, Errors: 0, Skipped: 1, Time elapsed: 5.285 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions
testNoTrackingUrlSetRMAppPageComplete[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions)
  Time elapsed: 0.052 sec   FAILURE!
org.junit.ComparisonFailure: 
expected:[proxy:8088/cluster/app/application_1397159363386_0031] but 
was:[N/A]
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions.testNoTrackingUrlSetRMAppPageComplete(TestRMAppAttemptTransitions.java:1252)

testNoTrackingUrlSetRMAppPageComplete[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions)
  Time elapsed: 0.043 sec   FAILURE!
org.junit.ComparisonFailure: 
expected:[proxy:8088/cluster/app/application_1397159363386_0062] but 
was:[N/A]
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions.testNoTrackingUrlSetRMAppPageComplete(TestRMAppAttemptTransitions.java:1252)

 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1106.patch, YARN-1106.patch


 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1921) Allow to override queue prefix, where new queues will be created


[ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965787#comment-13965787
 ] 

Hadoop QA commented on YARN-1921:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639595/YARN-1921.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3546//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3546//console

This message is automatically generated.

 Allow to override queue prefix, where new queues will be created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster


[ 
https://issues.apache.org/jira/browse/YARN-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965788#comment-13965788
 ] 

Hadoop QA commented on YARN-1923:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639632/YARN-1923.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3547//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3547//console

This message is automatically generated.

 Make FairScheduler resource ratio calculations terminate faster
 ---

 Key: YARN-1923
 URL: https://issues.apache.org/jira/browse/YARN-1923
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-1923.patch


 In fair scheduler computing shares continues till iterations are complete 
 even when we have a perfect match between the resource shares and total 
 resources. This is because the binary search checks only less or greater and 
 not equals. Add an early termination condition when its equal



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-04-10 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965792#comment-13965792
]

Sandy Ryza commented on YARN-1864:
--

[~octo47], a design goal is to be able to have multiple queues that have
user-queues underneath. E.g. an administrator might want to be able to
configure marketing and finance queues, and have queues based off of the
submitter's username within each of those queues. If I understand correctly,
your solution wouldn't accommodate this. Am I missing anything?

Fair Scheduler Dynamic Hierarchical User Queues
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue


[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965799#comment-13965799
 ] 

Chen He commented on YARN-1582:
---

+1, patch looks good to me. 

 Capacity Scheduler: add a maximum-allocation-mb setting per queue 
 --

 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1582-branch-0.23.patch


 We want to allow certain queues to use larger container sizes while limiting 
 other queues to smaller container sizes.  Setting it per queue will help 
 prevent abuse, help limit the impact of reservations, and allow changes in 
 the maximum container size to be rolled out more easily.
 One reason this is needed is more application types are becoming available on 
 yarn and certain applications require more memory to run efficiently. While 
 we want to allow for that we don't want other applications to abuse that and 
 start requesting bigger containers then what they really need.  
 Note that we could have this based on application type, but that might not be 
 totally accurate either since for example you might want to allow certain 
 users on MapReduce to use larger containers, while limiting other users of 
 MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

[
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965819#comment-13965819
]

Andrey Stepachev commented on YARN-1864:

BTW, PrimaryGroup and SecondaryGroup rules can be modified with attribute
useUserName, and if such attribute is true, rules will create queues based on
submitter names, not group names. That it backward compatible too, so such
changes will not hurt.

Fair Scheduler Dynamic Hierarchical User Queues
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set

2014-04-10 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965833#comment-13965833
 ] 

Varun Vasudev commented on YARN-1903:
-

+1, patch looks good.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1921) Allow to override queue prefix, where new queues will be created


 [ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1921:
---

Attachment: YARN-1921.patch

 Allow to override queue prefix, where new queues will be created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch, YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues


[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965855#comment-13965855
 ] 

Andrey Stepachev commented on YARN-1864:


I've updated patch with three additional rules 'userMatch', 
'primaryGroupMatch', and 'secondaryGroupMatch'.
Now testcase like this works:
{code}
  @Test
  public void testSpecifiedUserPolicyWithPrefix() throws Exception {
StringBuffer sb = new StringBuffer();
sb.append(queuePlacementPolicy);
sb.append(  rule name='specified' parent='granted'/);
sb.append(  rule name='userMatch'  parent='admin' 
pattern='admin1|admin4'/);
sb.append(  rule name='primaryGroupMatch' parent='admin.primg' 
pattern='admin2group.*'/);
sb.append(  rule name='secondaryGroupMatch' parent='admin.secg' 
pattern='admin3subgroup1'/);
sb.append(  rule name='user'  parent='guests'/);
sb.append(/queuePlacementPolicy);
QueuePlacementPolicy policy = parse(sb.toString());

assertEquals(root.granted.specifiedq,policy.assignAppToQueue(specifiedq, 
someuser));
assertEquals(root.admin.admin1, policy.assignAppToQueue(default, 
admin1));
assertEquals(root.admin.primg.admin2, policy.assignAppToQueue(default, 
admin2));
assertEquals(root.admin.secg.admin3, policy.assignAppToQueue(default, 
admin3));
assertEquals(root.guests.someuser, policy.assignAppToQueue(default, 
someuser));
assertEquals(root.guests.otheruser, policy.assignAppToQueue(default, 
otheruser));

{code}


 Fair Scheduler Dynamic Hierarchical User Queues
 ---

 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-1864-v1.txt


 In Fair Scheduler, we want to be able to create user queues under any parent 
 queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
 called root.allUserQueues, we want be able to create a new queue called 
 root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
 by this user to root.allUserQueues will be run in this newly created 
 root.allUserQueues.user1.
 This is very similar to the 'user-as-default' feature in Fair Scheduler which 
 creates user queues under root queue. But we want the ability to create user 
 queues under ANY parent queue.
 Why do we want this ?
 1. Preemption : these dynamically created user queues can preempt each other 
 if its fair share is not met. So there is fairness among users.
 User queues can also preempt other non-user leaf queue as well if below fair 
 share.
 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
 only a fraction of resources in the shared cluster. By creating this 
 feature,we could do that by giving a fair share to the parent user queue 
 which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1921) Allow to override queue prefix, where new queues will be created


 [ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1921:
---

Attachment: (was: YARN-1921.patch)

 Allow to override queue prefix, where new queues will be created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch, YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1921) Allow to override queue prefix, where new queues will be created


 [ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1921:
---

Attachment: YARN-1921.patch

 Allow to override queue prefix, where new queues will be created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch, YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT


[ 
https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965877#comment-13965877
 ] 

Hadoop QA commented on YARN-126:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12580129/YARN-126.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3549//console

This message is automatically generated.

 yarn rmadmin help message contains reference to hadoop cli and JT
 -

 Key: YARN-126
 URL: https://issues.apache.org/jira/browse/YARN-126
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Rémy SAISSY
  Labels: usability
 Attachments: YARN-126.patch


 has option to specify a job tracker and the last line for general command 
 line syntax had bin/hadoop command [genericOptions] [commandOptions]
 ran yarn rmadmin to get usage:
 RMAdmin
 Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-help [cmd]]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED

2014-04-10 Thread Arpit Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965886#comment-13965886
 ] 

Arpit Gupta commented on YARN-1924:
---

Here is the stack trace.

{code}
cheduler from user hrt_qa in queue default
2014-04-10 09:19:35,907 INFO  attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:handle(659)) - appattempt_1397121188061_0004_02 
State change from SUBMITTED to SCHEDULED
2014-04-10 09:19:36,095 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(639)) - 
application_1397121188061_0004 State change from ACCEPTED to KILLING
2014-04-10 09:19:36,096 INFO  attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(986)) - Updating 
application attempt appattempt_1397121188061_0004_02 with final state: 
KILLED
2014-04-10 09:19:36,096 INFO  attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:handle(659)) - appattempt_1397121188061_0004_02 
State change from SCHEDULED to FINAL_SAVING
2014-04-10 09:19:36,103 ERROR recovery.RMStateStore 
(RMStateStore.java:handleStoreEvent(681)) - Error storing appAttempt: 
appattempt_1397121188061_0004_02
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:831)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:930)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:949)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:831)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:845)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:862)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:604)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-04-10 09:19:36,107 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:handle(657)) - Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:831)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:930)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:949)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:831)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:845)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:862)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:604)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at

[jira] [Created] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED

2014-04-10 Thread Arpit Gupta (JIRA)

Arpit Gupta created YARN-1924:
-

 Summary: RM shut down with RMFatalEvent of type 
STATE_STORE_OP_FAILED
 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical


Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1925) TestSpeculativeExecutionWithMRApp fails


 [ 
https://issues.apache.org/jira/browse/YARN-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1925:
--

Labels: test  (was: )

 TestSpeculativeExecutionWithMRApp fails
 ---

 Key: YARN-1925
 URL: https://issues.apache.org/jira/browse/YARN-1925
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: test

 {code}
 junit.framework.AssertionFailedError: Couldn't speculate successfully
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at 
 org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithoutUpdateEvents(TestSpeculativeExecutionWithMRApp.java:122
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1925) TestSpeculativeExecutionWithMRApp fails

Zhijie Shen created YARN-1925:
-

 Summary: TestSpeculativeExecutionWithMRApp fails
 Key: YARN-1925
 URL: https://issues.apache.org/jira/browse/YARN-1925
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


{code}
junit.framework.AssertionFailedError: Couldn't speculate successfully
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.assertTrue(Assert.java:20)
at 
org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithoutUpdateEvents(TestSpeculativeExecutionWithMRApp.java:122
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965897#comment-13965897
 ] 

Jian He commented on YARN-1924:
---

Thanks Arpit for reporting this issue.

The problem is that if we kill the application when the app is at submitted 
state.  The app will try to save the final state before the initial state is 
saved, which causing no-node-exist exception.

Changed the ZK updateState API to check if the node exists. If it exists, do 
set operation, otherwise do create operation.

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED


 [ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1924:
--

Attachment: YARN-1924.1.patch

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1921) Allow to override queue prefix, where new queues will be created


[ 
https://issues.apache.org/jira/browse/YARN-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965913#comment-13965913
 ] 

Hadoop QA commented on YARN-1921:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639659/YARN-1921.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3548//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3548//console

This message is automatically generated.

 Allow to override queue prefix, where new queues will be created
 

 Key: YARN-1921
 URL: https://issues.apache.org/jira/browse/YARN-1921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.3.0
 Environment: Yarn 2.3.0
Reporter: Andrey Stepachev
 Attachments: YARN-1921.patch, YARN-1921.patch


 Fair scheduler has a couple of QueuePlacementRules. Those rules can create 
 queues, if they not exists with hardcoded prefix root.. 
 Consider an example: we have a placement rule, which creates user's queue if 
 it not exists. Current implementation creates it at root. prefix  Suppose 
 that this user runs a big job. In that case it will get a fair share of 
 resources because queue will be created at 'root.' with default settings, and 
 that affects all other users of the cluster. 
 Of course, FairScheduler can place such users to default queue, but in that 
 case if user submits a big queue it will eat resources of whole queue, and we 
 know that no preemption can be done within one queue (Or i'm wrong?). So 
 effectively one user can usurp all default queue resources.
 To solve that I created a patch, which allows to override root. prefix in 
 QueuePlacementRules. Thats gives us flexibility to automatically create 
 queues for users or group of users under predefined queue. So, every user 
 will get a separate queue and will share parent queue resources and can't 
 usurp all resources, because parent node can be configured to preempt tasks.
 Consider example (parent queue specified for each rule):
 {code:title=policy.xml|borderStyle=solid}
 queuePlacementPolicy
   rule name='specified' parent='granted'/
   rule name='user'  parent='guests'/
 /queuePlacementPolicy
 {code}
 With such definition queue requirements will give us:
 {code:title=Example.java|borderStyle=solid}
 root.granted.specifiedq == policy.assignAppToQueue(specifiedq, 
 someuser);
 root.guests.someuser == policy.assignAppToQueue(default, someuser);
 root.guests.otheruser == policy.assignAppToQueue(default, otheruser); 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster

2014-04-10 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965919#comment-13965919
 ] 

Sandy Ryza commented on YARN-1923:
--

A couple nits.  Otherwise, LGTM.

There's a false whitespace change.

Also,
{code}
+  }
+  else if (resourceRatio 
{code}
else should be on the same line as the closing curly brace.

resourceRatio isn't really an accurate name for the variable.  resourceUsed 
might make more sense.  

 Make FairScheduler resource ratio calculations terminate faster
 ---

 Key: YARN-1923
 URL: https://issues.apache.org/jira/browse/YARN-1923
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-1923.patch


 In fair scheduler computing shares continues till iterations are complete 
 even when we have a perfect match between the resource shares and total 
 resources. This is because the binary search checks only less or greater and 
 not equals. Add an early termination condition when its equal



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1304) Error starting AM: org.apache.hadoop.security.token.TokenIdentifier: Error reading configuration file

2014-04-10 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965966#comment-13965966
 ] 

Josh Elser commented on YARN-1304:
--

Just ran into this one myself. In my case, I had inadvertently updated some 
hadoop jars after the NM was already running. When an MR went to look at the 
classpath, some of the jars weren't on local filesystem anymore, and it bailed 
out.

Restarting the NM and verifying that all hadoop jars were present and readable 
were sufficient for me to work around this. Probably worthwhile to close this 
out without any more info.

 Error starting AM: org.apache.hadoop.security.token.TokenIdentifier: Error 
 reading configuration file
 -

 Key: YARN-1304
 URL: https://issues.apache.org/jira/browse/YARN-1304
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Bikas Saha





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED

2014-04-10 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965974#comment-13965974
 ] 

Karthik Kambatla commented on YARN-1924:


Thanks Jian. I, myself, ran into this once before when the HA work wasn't as 
stable. Let me take a look.

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default


 [ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1612:
--

Target Version/s: 3.0.0  (was: 3.0.0, 0.23.10)

 Change Fair Scheduler to not disable delay scheduling by default
 

 Key: YARN-1612
 URL: https://issues.apache.org/jira/browse/YARN-1612
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Chen He
 Attachments: YARN-1612.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster


[ 
https://issues.apache.org/jira/browse/YARN-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965988#comment-13965988
 ] 

Anubhav Dhoot commented on YARN-1923:
-

Would plannedResourceUsed or estimatedResourceUsed be better?

 Make FairScheduler resource ratio calculations terminate faster
 ---

 Key: YARN-1923
 URL: https://issues.apache.org/jira/browse/YARN-1923
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-1923.patch


 In fair scheduler computing shares continues till iterations are complete 
 even when we have a perfect match between the resource shares and total 
 resources. This is because the binary search checks only less or greater and 
 not equals. Add an early termination condition when its equal



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set


[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965992#comment-13965992
 ] 

Jian He commented on YARN-1903:
---

Looks good, one more suggestion, since the process is never started, we can add 
one more diagnostics to clarify that this is a not started container process.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966011#comment-13966011
 ] 

Hadoop QA commented on YARN-1924:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639665/YARN-1924.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3550//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3550//console

This message is automatically generated.

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol


[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966013#comment-13966013
 ] 

Jian He commented on YARN-1879:
---

sorry for the late response, was caught up with other things, will take a look 
in the next couple of days.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-04-10 Thread Ashwin Shankar (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966024#comment-13966024
]

Ashwin Shankar commented on YARN-1864:
--

Hi [~octo47],
We want to be able to support user queues for any rule(now and in the future)
without needing to add code
in that rule. I had a discussion with Sandy last week and we felt implementing
hierarchicalUserQueues using nested rules would make
it a) extensible and b) clean. For eg if we want user queues for Primary
group,all we have to do is :
rule = hierarhicalUserQueue
rule=primaryGroup /
/rule

The nested rule would be applied first and based on what it returns we can make
a decision at the HUQ level
whether to put it in a user queue underneath or skip to the next rule.
By this way we can also have 'create' flag at both hierarchicalUserQueue level
and the nest rule level which gives
the admin granularity of control over creating new queues.
If someone writes any other rule in future and want user queue support,they
just have to nest it with HUQ rule and things would just work.
No extra attributes in the xml is needed in the new rule.

As part of this patch, another thing I'm writing is to be able to mention
parent queues without leaf queues in the alloc xml,
which can then be used as user queues. I'm done with code,will write tests and
post a patch with all these features this week.

Fair Scheduler Dynamic Hierarchical User Queues
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster


 [ 
https://issues.apache.org/jira/browse/YARN-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1923:


Attachment: YARN-1923.002.patch

Addressed feedback

 Make FairScheduler resource ratio calculations terminate faster
 ---

 Key: YARN-1923
 URL: https://issues.apache.org/jira/browse/YARN-1923
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-1923.002.patch, YARN-1923.patch


 In fair scheduler computing shares continues till iterations are complete 
 even when we have a perfect match between the resource shares and total 
 resources. This is because the binary search checks only less or greater and 
 not equals. Add an early termination condition when its equal



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966029#comment-13966029
 ] 

Hadoop QA commented on YARN-1924:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639665/YARN-1924.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3551//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3551//console

This message is automatically generated.

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set


 [ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1903:
--

Attachment: YARN-1903.2.patch

Thanks for review! Patch is updated.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch, YARN-1903.2.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966055#comment-13966055
 ] 

Zhijie Shen commented on YARN-1924:
---

The change should fix the bug, and the patch looks good to me almost. Two nits:

1. As the log message has been changed in RMStateStore, it's good to say 
storing or updating in the corresponding methods in FS and Memory impl

2. One typo, and should it be error level log?
{code}
+  LOG.info(Error while doing ZK operaion., ke);
{code}

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1924) RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED


 [ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1924:
--

Attachment: YARN-1924.2.patch

Updated the patch accordingly.

Thanks for Zhijie and Karthik for taking a look. 

 RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
 

 Key: YARN-1924
 URL: https://issues.apache.org/jira/browse/YARN-1924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1924.1.patch, YARN-1924.2.patch


 Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set


[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966081#comment-13966081
 ] 

Jian He commented on YARN-1903:
---

+1, will commit once Jenkins returns.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch, YARN-1903.2.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1903) Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set


[ 
https://issues.apache.org/jira/browse/YARN-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966083#comment-13966083
 ] 

Hadoop QA commented on YARN-1903:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639694/YARN-1903.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3553//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3553//console

This message is automatically generated.

 Killing Container on NEW and LOCALIZING will result in exitCode and 
 diagnostics not set
 ---

 Key: YARN-1903
 URL: https://issues.apache.org/jira/browse/YARN-1903
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1903.1.patch, YARN-1903.2.patch


 The container status after stopping container is not expected.
 {code}
 java.lang.AssertionError: 4: 
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1923) Make FairScheduler resource ratio calculations terminate faster