[jira] [Updated] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables

2015-05-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2918:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 RM starts up fails if accessible-node-labels are configured to queue without 
 cluster lables
 ---

 Key: YARN-2918
 URL: https://issues.apache.org/jira/browse/YARN-2918
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith

 I configured accessible-node-labels to queue. But RM startup fails with below 
 exception. I see current steps to configure NodeLabel is first need to add 
 via rmadmin and later need to configure for queues. But it will be good if 
 both cluster and queue node labels has consitency in configuring it. 
 {noformat}
 2014-12-03 20:11:50,126 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
 NodeLabelManager doesn't include label = x, please check.
   at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203)
 Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
 please check.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2410:
-
Summary: Nodemanager ShuffleHandler can possible exhaust file descriptors  
(was: Nodemanager ShuffleHandler can easily exhaust file descriptors)

 Nodemanager ShuffleHandler can possible exhaust file descriptors
 

 Key: YARN-2410
 URL: https://issues.apache.org/jira/browse/YARN-2410
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Nathan Roberts
Assignee: Chen He

 The async nature of the shufflehandler can cause it to open a huge number of
 file descriptors, when it runs out it crashes.
 Scenario:
 Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
 Let's say all 6K reduces hit a node at about same time asking for their
 outputs. Each reducer will ask for all 40 map outputs over a single socket in 
 a
 single request (not necessarily all 40 at once, but with coalescing it is
 likely to be a large number).
 sendMapOutput() will open the file for random reading and then perform an 
 async transfer of the particular portion of this file(). This will 
 theoretically
 happen 6000*40=24 times which will run the NM out of file descriptors and 
 cause it to crash.
 The algorithm should be refactored a little to not open the fds until they're
 actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2015-05-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524153#comment-14524153
 ] 

Zhijie Shen commented on YARN-2266:
---

Are we still interested in this enhancement? Otherwise, we can close this jira 
as won't fix.

 Add an application timeout service in RM to kill applications which are not 
 getting resources
 -

 Key: YARN-2266
 URL: https://issues.apache.org/jira/browse/YARN-2266
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ashutosh Jindal

 Currently , If an application is submitted to RM, the app keeps waiting until 
 the resources are allocated for AM. Such an application may be stuck till a 
 resource is allocated for AM, and this may be due to over utilization of 
 Queue or User limits etc. In a production cluster, some periodic running 
 applications may have lesser cluster share. So after waiting for some time, 
 if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-886) make APPLICATION_STOP consistent with APPLICATION_INIT

2015-05-01 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524210#comment-14524210
 ] 

Siddharth Seth commented on YARN-886:
-

[~djp] - this looks like it's still valid. START is sent to the service that 
the app specified. STOP is sent to all AuxServices.

 make APPLICATION_STOP consistent with APPLICATION_INIT
 --

 Key: YARN-886
 URL: https://issues.apache.org/jira/browse/YARN-886
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 Currently, there is inconsistency between the start/stop behaviour.
 See Siddharth's comment in MAPREDUCE-5329: The start/stop behaviour should 
 be consistent. We shouldn't send the stop to all service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2419) RM applications page doesn't sort application id properly

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2419:
-
Target Version/s:   (was: 2.6.0)

 RM applications page doesn't sort application id properly
 -

 Key: YARN-2419
 URL: https://issues.apache.org/jira/browse/YARN-2419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Thomas Graves

 The ResourceManager apps page doesn't sort the application ids properly when 
 the app id rolls over from  to 1.
 When it rolls over the 1+ application ids end up being many pages down by 
 the 0XXX numbers.
 I assume we just sort alphabetically so we would need a special sorter that 
 knows about application ids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2415) Expose MiniYARNCluster for use outside of YARN

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524255#comment-14524255
 ] 

Junping Du commented on YARN-2415:
--

Hi [~ka...@cloudera.com] and [~ywskycn], do we have a plan for it?

 Expose MiniYARNCluster for use outside of YARN
 --

 Key: YARN-2415
 URL: https://issues.apache.org/jira/browse/YARN-2415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.5.0
Reporter: Hari Shreedharan
Assignee: Wei Yan

 The MR/HDFS equivalents are available for applications to use in tests, but 
 the YARN Mini cluster is not. It would be really useful to test applications 
 that are written to run on YARN (like Spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2318) hadoop configuraion checker

2015-05-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524286#comment-14524286
 ] 

Zhijie Shen commented on YARN-2318:
---

Do we still need this feature? Or we can close the jira as won't fix.

 hadoop configuraion checker
 ---

 Key: YARN-2318
 URL: https://issues.apache.org/jira/browse/YARN-2318
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: tangjunjie

 hadoop  have a lot of config property. People will make mistake when modify 
 configuration file. So hadoop can do config check tool .This tool can find 
 mistake as follow.
 if config 
 property
 namemapreduce.tasktracker.reduce.tasks.maximu/name should be 
 mapreduce.tasktracker.reduce.tasks.maximum
 value9/value
 descriptionThe maximum number of reduce tasks that will be run
 simultaneously by a task tracker.
 /description
   /property
 OR this tool can warn use deprecated property name and correct it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-05-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3375:
-
Priority: Critical  (was: Major)

 NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
 NodeHealthScriptRunner
 --

 Key: YARN-3375
 URL: https://issues.apache.org/jira/browse/YARN-3375
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Critical
 Attachments: YARN-3375.patch


 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
 the NodeHealthScriptRunner.
 {code:title=NodeManager.java|borderStyle=solid}
 if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
   LOG.info(Abey khali);
   return null;
 }
 {code}
 {code:title=NodeHealthCheckerService.java|borderStyle=solid}
 if (NodeHealthScriptRunner.shouldRun(
 conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
   addService(nodeHealthScriptRunner);
 }
 {code}
 {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
 if (!shouldRun(nodeHealthScript)) {
   LOG.info(Not starting node health monitor);
   return;
 }
 {code}
 2. If we don't configure node health script or configured health script 
 doesn't execute permission, NM logs with the below message.
 {code:xml}
 2015-03-19 19:55:45,713 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-05-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3375:
-
Target Version/s: 2.8.0  (was: 3.0.0)

 NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
 NodeHealthScriptRunner
 --

 Key: YARN-3375
 URL: https://issues.apache.org/jira/browse/YARN-3375
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-3375.patch


 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
 the NodeHealthScriptRunner.
 {code:title=NodeManager.java|borderStyle=solid}
 if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
   LOG.info(Abey khali);
   return null;
 }
 {code}
 {code:title=NodeHealthCheckerService.java|borderStyle=solid}
 if (NodeHealthScriptRunner.shouldRun(
 conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
   addService(nodeHealthScriptRunner);
 }
 {code}
 {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
 if (!shouldRun(nodeHealthScript)) {
   LOG.info(Not starting node health monitor);
   return;
 }
 {code}
 2. If we don't configure node health script or configured health script 
 doesn't execute permission, NM logs with the below message.
 {code:xml}
 2015-03-19 19:55:45,713 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-05-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524289#comment-14524289
 ] 

Wangda Tan commented on YARN-3375:
--

+1 also, rekicked Jenkins.

 NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
 NodeHealthScriptRunner
 --

 Key: YARN-3375
 URL: https://issues.apache.org/jira/browse/YARN-3375
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Critical
 Attachments: YARN-3375.patch


 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
 the NodeHealthScriptRunner.
 {code:title=NodeManager.java|borderStyle=solid}
 if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
   LOG.info(Abey khali);
   return null;
 }
 {code}
 {code:title=NodeHealthCheckerService.java|borderStyle=solid}
 if (NodeHealthScriptRunner.shouldRun(
 conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
   addService(nodeHealthScriptRunner);
 }
 {code}
 {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
 if (!shouldRun(nodeHealthScript)) {
   LOG.info(Not starting node health monitor);
   return;
 }
 {code}
 2. If we don't configure node health script or configured health script 
 doesn't execute permission, NM logs with the below message.
 {code:xml}
 2015-03-19 19:55:45,713 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2289) ApplicationHistoryStore should be versioned

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2289.
---
Resolution: Won't Fix

We won't do improvement for GHS

 ApplicationHistoryStore should be versioned
 ---

 Key: YARN-2289
 URL: https://issues.apache.org/jira/browse/YARN-2289
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications
Reporter: Junping Du
Assignee: Junping Du





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2454) The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong.

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524488#comment-14524488
 ] 

Hadoop QA commented on YARN-2454:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  38m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12664364/YARN-2454%20-v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7592/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7592/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7592/console |


This message was automatically generated.

 The function compareTo of variable UNBOUNDED in 
 org.apache.hadoop.yarn.util.resource.Resources is definited wrong.
 --

 Key: YARN-2454
 URL: https://issues.apache.org/jira/browse/YARN-2454
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.4.1
Reporter: Xu Yang
Assignee: Xu Yang
 Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, 
 YARN-2454.patch


 The variable UNBOUNDED implement the abstract class Resources, and override 
 the function compareTo. But there is something wrong in this function. We 
 should not compare resources with zero as the same as the variable NONE. We 
 should change 0 to Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-05-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524486#comment-14524486
 ] 

zhihai xu commented on YARN-2893:
-

thanks [~adhoot] for the review and thanks [~jira.shegalov] for the review and 
committing the patch ! Greatly appreciated.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, 
 YARN-2893.005.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1772) Fair Scheduler documentation should indicate that admin ACLs also give submit permissions

2015-05-01 Thread Naren Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524523#comment-14524523
 ] 

Naren Koneru commented on YARN-1772:


Hi Jian, I won't be able to.. Feel free to take it..

 Fair Scheduler documentation should indicate that admin ACLs also give submit 
 permissions
 -

 Key: YARN-1772
 URL: https://issues.apache.org/jira/browse/YARN-1772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Naren Koneru

 I can submit to a Fair Scheduler queue if I'm in the submit ACL OR if I'm in 
 the administer ACL.  The Fair Scheduler docs seem to leave out the second 
 part. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524855#comment-14524855
 ] 

Hadoop QA commented on YARN-2151:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12649887/YARN-2151.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7614/console |


This message was automatically generated.

 FairScheduler option for global preemption within hierarchical queues
 -

 Key: YARN-2151
 URL: https://issues.apache.org/jira/browse/YARN-2151
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Andrey Stepachev
 Attachments: YARN-2151.patch


 FairScheduler has hierarchical queues, but fair share calculation and 
 preemption still works withing a limited range and effectively still 
 nonhierarchical.
 This patch solves this incompleteness in two aspects:
 1. Currently MinShare is not propagated to upper queue, that leads to
 fair share calculation ignores all Min Shares in deeper queues. 
 Lets take an example
 (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues)
 {code}
 ?xml version=1.0?
 allocations
 queue name=queue1
   maxResources10240mb, 10vcores/maxResources
   queue name=big/
   queue name=sub1
 schedulingPolicyfair/schedulingPolicy
 queue name=sub11
   minResources6192mb, 6vcores/minResources
 /queue
   /queue
   queue name=sub2
   /queue
 /queue
 /allocations
 {code}
 Then bigApp started within queue1.big with 10x1GB containers.
 That effectively eats all maximum allowed resources for queue1.
 Subsequent requests for app1 (queue1.sub1.sub11) and 
 app2 (queue1.sub2) (5x1GB each) will wait for free resources. 
 Take a note, that sub11 has min share requirements for 6x1GB.
 Without given patch fair share will be calculated with no knowledge 
 about min share requirements and app1 and app2 will get equal 
 number of containers.
 With the patch resources will split according to min share ( in test
 it will be 5 for app1 and 1 for app2)
 That behaviour controlled by the same parameter as ‘globalPreemtion’,
 but that can be changed easily.
 Implementation is a bit awkward, but seems that method for min share
 recalculation can be exposed as public or protected api and constructor
 in FSQueue can call it before using minShare getter. But right now
 current implementation with nulls should work too.
 2. Preemption doesn’t works between queues on different level for the
 queues hierarchy. Moreover, it is not possible to override various 
 parameters for children queues. 
 This patch adds parameter ‘globalPreemption’, which enables global 
 preemption algorithm modifications.
 In a nutshell patch adds function shouldAttemptPreemption(queue),
 which can calculate usage for nested queues, and if queue with usage more 
 that specified threshold is found, preemption can be triggered.
 Aggregated minShare does the rest of work and preemption will work
 as expected within hierarchy of queues with different MinShare/MaxShare
 specifications on different levels.
 Test case TestFairScheduler#testGlobalPreemption depicts how it works.
 One big app gets resources above its fair share and app1 has a declared
 min share. On submission code finds that starvation and preempts enough
 containers to give enough room for app1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524861#comment-14524861
 ] 

Hadoop QA commented on YARN-2142:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12654924/final.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7616/console |


This message was automatically generated.

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: features
 Attachments: final.patch, trust.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.
 ***Only in branch-2.2.0 , not in trunk***
 OAT wiki link:
 https://github.com/OpenAttestation/OpenAttestation/wiki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524869#comment-14524869
 ] 

Hadoop QA commented on YARN-2325:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12656795/YARN-2325.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7618/console |


This message was automatically generated.

 need check whether node is null in nodeUpdate for FairScheduler 
 

 Key: YARN-2325
 URL: https://issues.apache.org/jira/browse/YARN-2325
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2325.000.patch


 need check whether node is null in nodeUpdate for FairScheduler.
 If nodeUpdate is called after removeNode, the getFSSchedulerNode will be 
 null. If the node is null, we should return with error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1515) Provide ContainerManagementProtocol#signalContainer processing a batch of signals

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524856#comment-14524856
 ] 

Hadoop QA commented on YARN-1515:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12645519/YARN-1515.v08.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7615/console |


This message was automatically generated.

 Provide ContainerManagementProtocol#signalContainer processing a batch of 
 signals 
 --

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch, 
 YARN-1515.v06.patch, YARN-1515.v07.patch, YARN-1515.v08.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-641) Make AMLauncher in RM Use NMClient

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524867#comment-14524867
 ] 

Hadoop QA commented on YARN-641:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12587395/YARN-641.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7617/console |


This message was automatically generated.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch, YARN-641.2.patch, YARN-641.3.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524847#comment-14524847
 ] 

Hadoop QA commented on YARN-126:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12580129/YARN-126.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7613/console |


This message was automatically generated.

 yarn rmadmin help message contains reference to hadoop cli and JT
 -

 Key: YARN-126
 URL: https://issues.apache.org/jira/browse/YARN-126
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Rémy SAISSY
  Labels: usability
 Attachments: YARN-126.patch


 has option to specify a job tracker and the last line for general command 
 line syntax had bin/hadoop command [genericOptions] [commandOptions]
 ran yarn rmadmin to get usage:
 RMAdmin
 Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-help [cmd]]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1287) Consolidate MockClocks

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524835#comment-14524835
 ] 

Hadoop QA commented on YARN-1287:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12621781/YARN-1287-3.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7611/console |


This message was automatically generated.

 Consolidate MockClocks
 --

 Key: YARN-1287
 URL: https://issues.apache.org/jira/browse/YARN-1287
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sebastian Wong
  Labels: newbie
 Attachments: YARN-1287-3.patch


 A bunch of different tests have near-identical implementations of MockClock.  
 TestFairScheduler, TestFSSchedulerApp, and TestCgroupsLCEResourcesHandler for 
 example.  They should be consolidated into a single MockClock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-670) Add an Exception to indicate 'Maintenance' for NMs and add this to the JavaDoc for appropriate protocols

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524141#comment-14524141
 ] 

Junping Du commented on YARN-670:
-

Resolve this as won't fix as Rolling upgrade won't need Maintenance model as 
containers can still be running when NM goes down. For gracefully decommission, 
we have YARN-3212 to make sure no new container get assigned on decommissioning 
container.

 Add an Exception to indicate 'Maintenance' for NMs and add this to the 
 JavaDoc for appropriate protocols
 

 Key: YARN-670
 URL: https://issues.apache.org/jira/browse/YARN-670
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-1688) Rethinking about POJO Classes

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reopened YARN-1688:
---

 Rethinking about POJO Classes
 -

 Key: YARN-1688
 URL: https://issues.apache.org/jira/browse/YARN-1688
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We need to think about how the POJO classes evolve. Should we back up them 
 with proto and others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1688.
---
Resolution: Fixed

YARN-3539 will state timeline v1 APIs stable. We won't change v1 pojo classes.

 Rethinking about POJO Classes
 -

 Key: YARN-1688
 URL: https://issues.apache.org/jira/browse/YARN-1688
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We need to think about how the POJO classes evolve. Should we back up them 
 with proto and others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1688.
---
Resolution: Won't Fix

 Rethinking about POJO Classes
 -

 Key: YARN-1688
 URL: https://issues.apache.org/jira/browse/YARN-1688
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We need to think about how the POJO classes evolve. Should we back up them 
 with proto and others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1733) Intermittent failed for TestRMWebServicesApps

2015-05-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-1733.
---
Resolution: Cannot Reproduce

 Intermittent failed for TestRMWebServicesApps
 -

 Key: YARN-1733
 URL: https://issues.apache.org/jira/browse/YARN-1733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Junping Du

 In some Jenkins tests (like: YARN-1506, YARN-1641),
 TestRMWebServicesApps get failed with log as: 
 java.lang.AssertionError: incorrect number of elements expected:20 but 
 was:18
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.verifyAppInfo(TestRMWebServicesApps.java:1321)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleAppsHelper(TestRMWebServicesApps.java:1261)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleApp(TestRMWebServicesApps.java:1153)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1971) WindowsLocalWrapperScriptBuilder does not check for errors in generated script

2015-05-01 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524190#comment-14524190
 ] 

Xuan Gong commented on YARN-1971:
-

[~rusanu] 
bq. These can fail due to access permissions, disc out of space, bad hardware, 
cosmic rays etc etc. There should be proper error checking to ease 
troubleshooting.

I agree. The script can fail due to those issues. But for example disc out of 
space, bad hardware, belongs to NM issue. We have already handle them at NM 
side. So, do we really need some pre-check for those issues ? It might not be 
easy.

 WindowsLocalWrapperScriptBuilder does not check for errors in generated script
 --

 Key: YARN-1971
 URL: https://issues.apache.org/jira/browse/YARN-1971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Minor

 Similar to YARN-1865. The 
 DefaultContainerExecutor.WindowsLocalWrapperScriptBuilder builds a shell 
 script that contains commands that potentially may fail:
 {code}
 pout.println(@echo  + containerIdStr ++ normalizedPidFile +.tmp);
 pout.println(@move /Y  + normalizedPidFile + .tmp  + normalizedPidFile); 
 {code}
 These can fail due to access permissions, disc out of space, bad hardware, 
 cosmic rays etc etc. There should be proper error checking to ease 
 troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-01 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3518:

Labels: configuration newbie  (was: newbie)

 default rm/am expire interval should not less than default resourcemanager 
 connect wait time
 

 Key: YARN-3518
 URL: https://issues.apache.org/jira/browse/YARN-3518
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Reporter: sandflee
  Labels: configuration, newbie
 Attachments: YARN-3518.001.patch


 take am for example, if am can't connect to RM, after am expire (600s), RM 
 relaunch am, and there will be two am at the same time util resourcemanager 
 connect max wait time(900s) passed.
 DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS =  15 * 60 * 1000;
 DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60;
 DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-01 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3518:

Labels: newbie  (was: )

 default rm/am expire interval should not less than default resourcemanager 
 connect wait time
 

 Key: YARN-3518
 URL: https://issues.apache.org/jira/browse/YARN-3518
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Reporter: sandflee
  Labels: configuration, newbie
 Attachments: YARN-3518.001.patch


 take am for example, if am can't connect to RM, after am expire (600s), RM 
 relaunch am, and there will be two am at the same time util resourcemanager 
 connect max wait time(900s) passed.
 DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS =  15 * 60 * 1000;
 DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60;
 DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2415) Expose MiniYARNCluster for use outside of YARN

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2415:
-
Target Version/s:   (was: 2.6.0)

 Expose MiniYARNCluster for use outside of YARN
 --

 Key: YARN-2415
 URL: https://issues.apache.org/jira/browse/YARN-2415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.5.0
Reporter: Hari Shreedharan
Assignee: Wei Yan

 The MR/HDFS equivalents are available for applications to use in tests, but 
 the YARN Mini cluster is not. It would be really useful to test applications 
 that are written to run on YARN (like Spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2425) When Application submitted by via Yarn RM WS, log aggregation does not happens

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524268#comment-14524268
 ] 

Junping Du commented on YARN-2425:
--

Is this still the issue?

 When Application submitted by via Yarn RM WS, log aggregation does not happens
 --

 Key: YARN-2425
 URL: https://issues.apache.org/jira/browse/YARN-2425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.0, 2.6.0
 Environment: Secure (Kerberos enabled) hadoop cluster. With SPNEGO 
 for Yarn RM enabled
Reporter: Karam Singh
Assignee: Varun Vasudev

 When submit App to Yarn RM using Web service we need to pass 
 credentials/tokens in json object/xml object to webservice
 As HDFS namenode does not provides any DT over WS (base64 encoded) like 
 webhdfs/timeline server does. (HDFS fetch dt commad fetch java writable 
 object and writes it to target file, we we cannot forward via application 
 Submission WS objects)
 Looks like there is not way to pass HDFS token to NodeManager. 
 While starting Application container also tries to create Application log 
 aggregation dir and fails with following type exception
 {code}
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: hostname/ip; 
 destination host is: NameNodeHost:FSPort;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1415)
 at org.apache.hadoop.ipc.Client.call(Client.java:1364)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:725)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1781)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1069)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1065)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1065)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:240)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:64)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:253)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:344)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:310)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:421)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:64)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]

[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-05-01 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524267#comment-14524267
 ] 

Sidharta Seethana commented on YARN-3375:
-

+1 to the patch - the changes seem good to me.


 NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
 NodeHealthScriptRunner
 --

 Key: YARN-3375
 URL: https://issues.apache.org/jira/browse/YARN-3375
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-3375.patch


 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
 the NodeHealthScriptRunner.
 {code:title=NodeManager.java|borderStyle=solid}
 if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
   LOG.info(Abey khali);
   return null;
 }
 {code}
 {code:title=NodeHealthCheckerService.java|borderStyle=solid}
 if (NodeHealthScriptRunner.shouldRun(
 conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
   addService(nodeHealthScriptRunner);
 }
 {code}
 {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
 if (!shouldRun(nodeHealthScript)) {
   LOG.info(Not starting node health monitor);
   return;
 }
 {code}
 2. If we don't configure node health script or configured health script 
 doesn't execute permission, NM logs with the below message.
 {code:xml}
 2015-03-19 19:55:45,713 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524356#comment-14524356
 ] 

Junping Du commented on YARN-2470:
--

Agree with [~chris.douglas]. This shouldn't be a problem as this is expected as 
common behavior for other int value. Close it as won't fix.

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524357#comment-14524357
 ] 

Hadoop QA commented on YARN-2892:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 15s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m  6s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 26s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestAppManager |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12684732/YARN-2892.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d3d019c |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7584/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7584/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7584/console |


This message was automatically generated.

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception an

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-2470.
--
Resolution: Won't Fix

 A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager 
 to crash. Slider needs this value to be high. Setting a very high value 
 throws an exception and nodemanager does not start
 --

 Key: YARN-2470
 URL: https://issues.apache.org/jira/browse/YARN-2470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Shivaji Dutta
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3534) Collect node resource utilization

2015-05-01 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3534:
--
Attachment: YARN-3534-8.patch

Fixing code style issues (nwo the output of the checker is meaningful so life 
is much easier).

The broken tests weren't related to my changes.

Any proposal for unit tests? The ones in ContainerMonitorImpl don't really 
apply.

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, 
 YARN-3534-7.patch, YARN-3534-8.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524493#comment-14524493
 ] 

Hadoop QA commented on YARN-1917:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | mapreduce tests |  73m 26s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   6m  8s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | | 125m 51s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.mapred.TestMapRed |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
| Timed out tests | org.apache.hadoop.mapred.TestMiniMRClasspath |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729871/YARN-1917.20150501.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6f541ed |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7590/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7590/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7590/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7590/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7590/console |


This message was automatically generated.

 Add waitForApplicationState interface to YarnClient
 -

 Key: YARN-1917
 URL: https://issues.apache.org/jira/browse/YARN-1917
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-1917.20150501.1.patch, YARN-1917.patch, 
 YARN-1917.patch, YARN-1917.patch


 Currently, YARN dosen't have this method. Users needs to write 
 implementations like UnmanagedAMLauncher.monitorApplication or 
 mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful 
 to end users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1638.
---
Resolution: Fixed

We already have integration test in some way, such as in TestDistributedShell

 Add an integration test validating post, storage and retrival of 
 entites+events
 ---

 Key: YARN-1638
 URL: https://issues.apache.org/jira/browse/YARN-1638
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-05-01 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3552:

Labels: newbie  (was: )

 RM Web UI shows -1 running containers for completed apps
 

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
  Labels: newbie
 Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 
 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3513) Remove unused variables in ContainersMonitorImpl

2015-05-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524215#comment-14524215
 ] 

Li Lu commented on YARN-3513:
-

Hi [~Naganarasimha], thanks for the patch. +1 for removing {{vmemStillInUsage}} 
and {{pmemStillInUsage}}. However, I noticed that we're using content in the 
2928 branch. Since we're planning for a branch merge, potentially soon, maybe 
it's fine to leave it there? 

 Remove unused variables in ContainersMonitorImpl
 

 Key: YARN-3513
 URL: https://issues.apache.org/jira/browse/YARN-3513
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3513.20150421-1.patch


 class members :  {{private final Context context;}}
 and some local variables in MonitoringThread.run()  : {{vmemStillInUsage and 
 pmemStillInUsage}} are not used and just updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1530.
---
Resolution: Fixed

Timeline service v1 is almost done. Most functionality has been committed 
through multiple versions, but mostly completed before 2.6. There're still a 
few outstanding issues, which are kept open for further discussion.

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-05-01 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524219#comment-14524219
 ] 

Sidharta Seethana commented on YARN-3381:
-

Patch seems to apply. I'll re-submit to Jenkins.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
 Attachments: YARN-3381-002.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-1935) Security for timeline server

2015-05-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523964#comment-14523964
 ] 

Zhijie Shen edited comment on YARN-1935 at 5/1/15 10:53 PM:


Close the umbrella jira as the security work is almost done during 2.5 and 2.6. 
The only left issue is to put generic history data in a non-default domain in 
secure scenario. Since we don't go on to develop new feature for ATS v1, we can 
leave that jira  (YARN-2622) open and see if we have the supporting requirement 
for it.


was (Author: zjshen):
Close the umbrella jira. The only left issue is to put generic history data in 
a non-default domain in secure scenario. Since we don't go on to develop new 
feature for ATS v1, we can leave that jira  (YARN-2622) open and see if we have 
the supporting requirement for it.

 Security for timeline server
 

 Key: YARN-1935
 URL: https://issues.apache.org/jira/browse/YARN-1935
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Zhijie Shen
 Attachments: Timeline Security Diagram.pdf, 
 Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch


 Jira to track work to secure the ATS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3509) CollectorNodemanagerProtocol's authorization doesn't work

2015-05-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524247#comment-14524247
 ] 

Li Lu commented on YARN-3509:
-

Hi [~zjshen], thanks for working on this. I'm wondering if this problem will 
block any testing work for the YARN-2928 branch? If so, we may want to have a 
quick fix now, or else, I agree with [~djp] that we can wait a bit for the 
security design is ready. 

 CollectorNodemanagerProtocol's authorization doesn't work
 -

 Key: YARN-3509
 URL: https://issues.apache.org/jira/browse/YARN-3509
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, security, timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3509.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524246#comment-14524246
 ] 

Junping Du commented on YARN-2416:
--

Thanks for identifying and reporting the issue, [~john.jian.fang]!  Add state 
transition from ALLOCATED to RUNNING. Mind deliver a fix for it?

 InvalidStateTransitonException in ResourceManager if AMLauncher does not 
 receive response for startContainers() call in time
 

 Key: YARN-2416
 URL: https://issues.apache.org/jira/browse/YARN-2416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jian Fang

 AMLauncher calls startContainers(allRequests) to launch a container for 
 application master. Normally, the call comes back immediately so that the 
 RMAppAttempt changes its state from ALLOCATED to LAUNCHED. 
 However, we do observed that in some cases, the RPC call came back very late 
 but the AM container was already started. Because the RMAppAttempt stuck in 
 ALLOCATED state, once resource manager received the REGISTERED event from the 
 application master, it threw InvalidStateTransitonException as follows.
 2014-07-05 08:59:05,021 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 REGISTERED at ALLOCATED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, 
 resource manager kept throwing InvalidStateTransitonException.
 2014-07-05 08:59:06,152 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at ALLOCATED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 2014-07-05 08:59:07,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1404549222428_0001_02_02 Container Transitioned from NEW to
  ALLOCATED
 2014-07-05 08:59:07,779 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at ALLOCATED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 

[jira] [Commented] (YARN-2256) Too many nodemanager and resourcemanager audit logs are generated

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524274#comment-14524274
 ] 

Hadoop QA commented on YARN-2256:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  4s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |   5m 50s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  42m 51s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.security.TestNMTokenSecretManagerInNM |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12655753/YARN-2256.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d3d019c |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7581/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7581/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7581/console |


This message was automatically generated.

 Too many nodemanager and resourcemanager audit logs are generated
 -

 Key: YARN-2256
 URL: https://issues.apache.org/jira/browse/YARN-2256
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.4.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2256.patch


 Following audit logs are generated too many times(due to the possibility of a 
 large number of containers) :
 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
 container
 2. In RM - Audit logs corresponding to AM allocating a container and AM 
 releasing a container
 We can have different log levels even for NM and RM audit logs and move these 
 successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2307.
---
Resolution: Invalid

You probably miss setting {{yarn.acl.enable=true}} in yarn-site.xml. Close if 
for now. Feel free to reopen if it's not your case.

 Capacity scheduler user only ADMINISTER_QUEUE also can submit app 
 --

 Key: YARN-2307
 URL: https://issues.apache.org/jira/browse/YARN-2307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.3.0
 Environment: hadoop 2.3.0  centos6.5  jdk1.7
Reporter: tangjunjie
Priority: Minor

 Queue acls for user :  root
 Queue  Operations
 =
 root  
 default  
 china  ADMINISTER_QUEUE
 unfunded 
 user root only have ADMINISTER_QUEUE  but user root can sumbit app to
 china queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1770) Execessive logging for app and attempts on RM recovery

2015-05-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1770:
--
Summary: Execessive logging for app and attempts on RM recovery  (was: Too 
much logging for app and attempts on RM recovery)

 Execessive logging for app and attempts on RM recovery
 --

 Key: YARN-1770
 URL: https://issues.apache.org/jira/browse/YARN-1770
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Priority: Minor

 There's too much logging for app and attempts when RM is recovering, some of 
 them are duplicates. we should consolidate them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1990) Track time-to-allocation for different size containers

2015-05-01 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524290#comment-14524290
 ] 

Xuan Gong commented on YARN-1990:
-

Close this ticket based on [~curino]'s comment.

 Track time-to-allocation for different size containers 
 ---

 Key: YARN-1990
 URL: https://issues.apache.org/jira/browse/YARN-1990
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino

 Allocation of Large Containers are notoriously problematic, as smaller 
 containers can more easily grab resources. 
 The proposal for this JIRA is to maintain a map of container sizes, and 
 time-to-allocation, that can be used as:
 * general insight on cluster behavior, 
 * to inform the reservation-system, and allows us to account for delays in 
 allocation, so that the user reservation is respected regardless the size of 
 containers requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1990) Track time-to-allocation for different size containers

2015-05-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-1990.
-
Resolution: Invalid

 Track time-to-allocation for different size containers 
 ---

 Key: YARN-1990
 URL: https://issues.apache.org/jira/browse/YARN-1990
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino

 Allocation of Large Containers are notoriously problematic, as smaller 
 containers can more easily grab resources. 
 The proposal for this JIRA is to maintain a map of container sizes, and 
 time-to-allocation, that can be used as:
 * general insight on cluster behavior, 
 * to inform the reservation-system, and allows us to account for delays in 
 allocation, so that the user reservation is respected regardless the size of 
 containers requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2137) Add support for logaggregation to a path on non-default filecontext

2015-05-01 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524303#comment-14524303
 ] 

Xuan Gong commented on YARN-2137:
-

[~ksumit] Any update for this one ?

 Add support for logaggregation to a path on non-default filecontext
 ---

 Key: YARN-2137
 URL: https://issues.apache.org/jira/browse/YARN-2137
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Affects Versions: 2.4.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
 Attachments: YARN-2137.patch


 Current log-aggregation implementation supports logaggregation to default 
 filecontext only. This patch is to support logaggregation to any of the 
 supported filesystems within hadoop eco-system (hdfs, s3, swiftfs etc). So 
 for example a customer could use hdfs as default filesystem but use s3 or 
 swiftfs for logaggregation. Current implementation makes mixed usages of 
 FileContext+AbstractFileSystem apis as well as FileSystem apis which is 
 confusing.
 This patch does two things:
 # moves logaggregation implementation to use only FileContext apis
 # adds support for doing log aggregation on non-default filesystem as well.
 # changes TestLogAggregationService to use local filesystem itself instead of 
 mocking the behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-05-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524329#comment-14524329
 ] 

Wangda Tan commented on YARN-3017:
--

I can see this behavior in latest trunk as well:

{code}
2015-05-01 00:53:44,575 INFO  attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:handle(793)) - appattempt_1430441527236_0001_01 
State change from SUBMITTED to SCHEDULED
2015-05-01 00:53:44,928 INFO  rmcontainer.RMContainerImpl 
(RMContainerImpl.java:handle(394)) - container_1430441527236_0001_01_01 
Container Transitioned from NEW to ALLOCATED
{code}

It's better to make them consistent

 ContainerID in ResourceManager Log Has Slightly Different Format From 
 AppAttemptID
 --

 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: MUFEED USMAN
Priority: Minor

 Not sure if this should be filed as a bug or not.
 In the ResourceManager log in the events surrounding the creation of a new
 application attempt,
 ...
 ...
 2014-11-14 17:45:37,258 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
 masterappattempt_1412150883650_0001_02
 ...
 ...
 The application attempt has the ID format _1412150883650_0001_02.
 Whereas the associated ContainerID goes by _1412150883650_0001_02_.
 ...
 ...
 2014-11-14 17:45:37,260 INFO
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
 up
 container Container: [ContainerId: container_1412150883650_0001_02_01,
 NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, 
 vCores:1,
 disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
 ...
 ...
 Curious to know if this is kept like that for a reason. If not while using
 filtering tools to, say, grep events surrounding a specific attempt by the
 numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3401) [Security] users should not be able to create a generic TimelineEntity and associate arbitrary type

2015-05-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524328#comment-14524328
 ] 

Li Lu commented on YARN-3401:
-

I just changed the title of this JIRA to security so that we're decoupling this 
JIRA with data model related changes. This JIRA is part of the (not-yet) 
proposed security design for timeline v2. I'm not sure the role of this JIRA 
after we have a comprehensive design, so I'm just linking this JIRA to the 
security JIRA so that we remember this use case. 

 [Security] users should not be able to create a generic TimelineEntity and 
 associate arbitrary type
 ---

 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 IIUC it is possible for users to create a generic TimelineEntity and set an 
 arbitrary entity type. For example, for a YARN app, the right entity API is 
 ApplicationEntity. However, today nothing stops users from instantiating a 
 base TimelineEntity class and set the application type on it. This presents a 
 problem in handling these YARN system entities in the storage layer for 
 example.
 We need to ensure that the API allows only the right type of the class to be 
 created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.

2015-05-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524352#comment-14524352
 ] 

zhihai xu commented on YARN-3385:
-

Thanks [~sidharta-s], I uploaded a new patch YARN-3385.001.patch based on the 
latest code base.

 Race condition: KeeperException$NoNodeException will cause RM shutdown during 
 ZK node deletion.
 ---

 Key: YARN-3385
 URL: https://issues.apache.org/jira/browse/YARN-3385
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3385.000.patch, YARN-3385.001.patch


 Race condition: KeeperException$NoNodeException will cause RM shutdown during 
 ZK node deletion(Op.delete).
 The race condition is similar as YARN-3023.
 since the race condition exists for ZK node creation, it should also exist 
 for  ZK node deletion.
 We see this issue with the following stack trace:
 {code}
 2015-03-17 19:18:58,958 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.

2015-05-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3385:
--
Target Version/s: 2.8.0

 Race condition: KeeperException$NoNodeException will cause RM shutdown during 
 ZK node deletion.
 ---

 Key: YARN-3385
 URL: https://issues.apache.org/jira/browse/YARN-3385
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3385.000.patch, YARN-3385.001.patch


 Race condition: KeeperException$NoNodeException will cause RM shutdown during 
 ZK node deletion(Op.delete).
 The race condition is similar as YARN-3023.
 since the race condition exists for ZK node creation, it should also exist 
 for  ZK node deletion.
 We see this issue with the following stack trace:
 {code}
 2015-03-17 19:18:58,958 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.

2015-05-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524399#comment-14524399
 ] 

zhihai xu commented on YARN-3385:
-

Agreed, If we have YARN-2716, this problem may be solved with it. thanks 
[~jianhe]!
It may take sometime to stabilize YARN-2716, In the interim, it will be useful 
to fix this issue.

 Race condition: KeeperException$NoNodeException will cause RM shutdown during 
 ZK node deletion.
 ---

 Key: YARN-3385
 URL: https://issues.apache.org/jira/browse/YARN-3385
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3385.000.patch, YARN-3385.001.patch


 Race condition: KeeperException$NoNodeException will cause RM shutdown during 
 ZK node deletion(Op.delete).
 The race condition is similar as YARN-3023.
 since the race condition exists for ZK node creation, it should also exist 
 for  ZK node deletion.
 We see this issue with the following stack trace:
 {code}
 2015-03-17 19:18:58,958 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524400#comment-14524400
 ] 

Hadoop QA commented on YARN-1743:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 49s | The applied patch generated  66  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 14s | The applied patch generated  6 
new checkstyle issues (total was , now 6). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 25s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   5m 51s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12694668/YARN-1743-3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6f541ed |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/diffJavadocWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7588/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7588/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7588/console |


This message was automatically generated.

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
  Labels: documentation
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, 
 YARN-1743-3.patch, YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1805) Signal container request delivery from resourcemanager to nodemanager

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524810#comment-14524810
 ] 

Hadoop QA commented on YARN-1805:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12643371/YARN-1805.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7609/console |


This message was automatically generated.

 Signal container request delivery from resourcemanager to nodemanager
 -

 Key: YARN-1805
 URL: https://issues.apache.org/jira/browse/YARN-1805
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1805.patch


 1. Update ResourceTracker's HeartbeatResponse to include the list of 
 SignalContainerRequest.
 2. Upon receiving the request, NM's NodeStatusUpdater will deliver the 
 request to ContainerManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524802#comment-14524802
 ] 

Hadoop QA commented on YARN-1427:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12646616/YARN-1427.1.patch |
| Optional Tests |  |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7607/console |


This message was automatically generated.

 yarn-env.cmd should have the analog comments that are in yarn-env.sh
 

 Key: YARN-1427
 URL: https://issues.apache.org/jira/browse/YARN-1427
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Zhijie Shen
  Labels: newbie, windows
 Attachments: YARN-1427.1.patch


 There're the paragraphs of about RM/NM env vars (probably AHS as well soon) 
 in yarn-env.sh. Should the windows version script provide the similar 
 comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1803) Signal container support in nodemanager

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524800#comment-14524800
 ] 

Hadoop QA commented on YARN-1803:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12643173/YARN-1803.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7606/console |


This message was automatically generated.

 Signal container support in nodemanager
 ---

 Key: YARN-1803
 URL: https://issues.apache.org/jira/browse/YARN-1803
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1803.patch


 It could include the followings.
 1. ContainerManager is able to process a new event type 
 ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and 
 deliver the request to ContainerExecutor.
 2. Translate the platform independent signal command to Linux specific 
 signals. Windows support will be tracked by another task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-445) Ability to signal containers

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524805#comment-14524805
 ] 

Hadoop QA commented on YARN-445:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7608/console |


This message was automatically generated.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
 YARN-445--n3.patch, YARN-445--n4.patch, 
 YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524817#comment-14524817
 ] 

Hadoop QA commented on YARN-2120:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12648681/YARN-2120.v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7610/console |


This message was automatically generated.

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: 76AD6A72-9A0D-4F3A-A7B8-6EC1DCBD543A.png, 
 YARN-2120.v1.patch, YARN-2120.v2.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1714) Per user and per queue view in YARN RM

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524103#comment-14524103
 ] 

Jian He commented on YARN-1714:
---

[~l201514], I think the RM rest api now supports the filtering. Is this jira to 
add web UI support? 
canceling the patch as it doesn't apply now.

 Per user and per queue view in YARN RM
 --

 Key: YARN-1714
 URL: https://issues.apache.org/jira/browse/YARN-1714
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-1714.v1.patch, YARN-1714.v2.patch, 
 YARN-1714.v3.patch


 ResourceManager exposes either one or all jobs via WebUI. It would be good to 
 have filter for user so that see only their jobs.
 Provide rest style url to access only user specified queue or user apps. 
 For instance,
 http://hadoop-example.com:50030/cluster/user/toto 
 displays apps owned by toto
 http://hadoop-example.com:50030/cluster/user/toto,glinda  
 displays apps owned by toto and glinda
 http://hadoop-example.com:50030/cluster/queue/root.queue1 
displays apps in root.queue1
 http://hadoop-example.com:50030/cluster/queue/root.queue1,root.queue2   
 displays apps in root.queue1 and  root.queue2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-01 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3554:

Labels: newbie  (was: )

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
  Labels: newbie
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-05-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524170#comment-14524170
 ] 

Wangda Tan commented on YARN-3388:
--

[~nroberts], any updates on this?

 Allocation in LeafQueue could get stuck because DRF calculator isn't well 
 supported when computing user-limit
 -

 Key: YARN-3388
 URL: https://issues.apache.org/jira/browse/YARN-3388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch


 When there are multiple active users in a queue, it should be possible for 
 those users to make use of capacity up-to max_capacity (or close). The 
 resources should be fairly distributed among the active users in the queue. 
 This works pretty well when there is a single resource being scheduled.   
 However, when there are multiple resources the situation gets more complex 
 and the current algorithm tends to get stuck at Capacity. 
 Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1664) Add a utility to retrieve the RM Principal (renewer for tokens)

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524166#comment-14524166
 ] 

Jian He commented on YARN-1664:
---

[~sseth], is this still needed ? how is Tez doing it now ?

 Add a utility to retrieve the RM Principal (renewer for tokens)
 ---

 Key: YARN-1664
 URL: https://issues.apache.org/jira/browse/YARN-1664
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth

 Currently the logic to retrieve the renewer to be used while retrieving HDFS 
 tokens resides in MapReduce. This should ideally be a utility in YARN since 
 it's likely to be required by other applications as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2043) Rename internal names to being Timeline Service instead of application history

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2043.
---
Resolution: Won't Fix

We won't refactor ATS v1 any more

 Rename internal names to being Timeline Service instead of application history
 --

 Key: YARN-2043
 URL: https://issues.apache.org/jira/browse/YARN-2043
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Naganarasimha G R

 Like package and class names. In line with YARN-2033, YARN-1982 etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2060) Add an admin module for the timeline server

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2060.
---
Resolution: Won't Fix

We won't add new feature to ATS v1

 Add an admin module for the timeline server
 ---

 Key: YARN-2060
 URL: https://issues.apache.org/jira/browse/YARN-2060
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Like the job history server, it's good to have an admin module for the 
 timeline server to allow the admin to manage the server on the fly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-321) [Umbrella] Generic application history service

2015-05-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524113#comment-14524113
 ] 

Zhijie Shen edited comment on YARN-321 at 5/1/15 10:47 PM:
---

Close this umbrella jira with few sub tasks open. Generic history service has 
been implemented and rides on timeline server, but is not production ready. The 
subtasks have come into multiple versions, but mostly before 2.6. YARN-2271 is 
left open to track one possible performance issue to fetch all the applications 
stored in the timeline store.


was (Author: zjshen):
Close this umbrella jira with few sub tasks open. Generic history service has 
been implemented and rides on timeline server. YARN-2271 is left open to track 
one possible performance issue to fetch all the applications stored in the 
timeline store.

 [Umbrella] Generic application history service
 --

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2955) mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector

2015-05-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-2955.
--
Resolution: Invalid

It's a warning and this JIRA doesn't contain necessary information, closing as 
invalid and [~jyf2100] please reopen it if you have more information on this.

 mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector
 -

 Key: YARN-2955
 URL: https://issues.apache.org/jira/browse/YARN-2955
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
 Environment: cdh5.1.0
Reporter: Rocju

 2014-12-12 02:26:55,047 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 SelectChannelConnector@dcnn2:23188
 2014-12-12 02:26:55,052 WARN  mortbay.log (Slf4jLog.java:warn(89)) - 
 EXCEPTION 
 java.lang.InterruptedException
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:545)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:639)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
   at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:154)
   at 
 org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:904)
   at 
 org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:755)
   at java.io.Writer.write(Writer.java:157)
   at java.io.PrintWriter.newLine(PrintWriter.java:480)
   at java.io.PrintWriter.println(PrintWriter.java:629)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._p(HamletImpl.java:110)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$SCRIPT._(Hamlet.java:454)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlock.render(AppsBlock.java:119)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlockWithMetrics.render(AppsBlockWithMetrics.java:29)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMDispatcher.service(RMDispatcher.java:77)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 

[jira] [Resolved] (YARN-1756) Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed

2015-05-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-1756.
---
Resolution: Won't Fix

 Capture one more timestamp for an application when 
 ApplicationClientProtocol#getNewApplication is executed
 --

 Key: YARN-1756
 URL: https://issues.apache.org/jira/browse/YARN-1756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma

 The application submission time ( when submitApplication is called) is 
 collected by RM and application history server. But it doesn't capture when 
 the client calls newApplication method. The delta between newApplication and 
 submitApplication could be useful if the client submits large jar files. This 
 metric will be useful for https://issues.apache.org/jira/browse/YARN-1492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1756) Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524239#comment-14524239
 ] 

Jian He commented on YARN-1756:
---

this sounds like a customized YARN feature request for MR only. I think MR can 
do this itself.
I don't think we'll have more progress here in reality. close this.. please 
re-open if this requirement is still needed. 

 Capture one more timestamp for an application when 
 ApplicationClientProtocol#getNewApplication is executed
 --

 Key: YARN-1756
 URL: https://issues.apache.org/jira/browse/YARN-1756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma

 The application submission time ( when submitApplication is called) is 
 collected by RM and application history server. But it doesn't capture when 
 the client calls newApplication method. The delta between newApplication and 
 submitApplication could be useful if the client submits large jar files. This 
 metric will be useful for https://issues.apache.org/jira/browse/YARN-1492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2015-05-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-2305.
--
Resolution: Duplicate

This JIRA should be resolved already, now CSQueueUtils uses QueueResourceUsage 
instead of QueueMetrics, it is updated for every container allocation/resource 
update.

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Sunil G
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1772) Fair Scheduler documentation should indicate that admin ACLs also give submit permissions

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524304#comment-14524304
 ] 

Jian He commented on YARN-1772:
---

[~naren.koneru], would you still like to work on this ? cc/ [~kasha] 

 Fair Scheduler documentation should indicate that admin ACLs also give submit 
 permissions
 -

 Key: YARN-1772
 URL: https://issues.apache.org/jira/browse/YARN-1772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Naren Koneru

 I can submit to a Fair Scheduler queue if I'm in the submit ACL OR if I'm in 
 the administer ACL.  The Fair Scheduler docs seem to leave out the second 
 part. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3422) relatedentities always return empty list when primary filter is set

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3422:
--
Labels:   (was: newbie)

 relatedentities always return empty list when primary filter is set
 ---

 Key: YARN-3422
 URL: https://issues.apache.org/jira/browse/YARN-3422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3422.1.patch


 When you curl for ats entities with a primary filter, the relatedentities 
 fields always return empty list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3407) HttpServer2 Max threads in TimelineCollectorManager should be more than 10

2015-05-01 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3407:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-2928

 HttpServer2 Max threads in TimelineCollectorManager should be more than 10
 --

 Key: YARN-3407
 URL: https://issues.apache.org/jira/browse/YARN-3407
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently TimelineCollectorManager sets HttpServer2.HTTP_MAX_THREADS to just 
 10. This value might be too less for serving put requests. By default 
 HttpServer2 will have max threads value of 250. We can probably make it 
 configurable too so that an optimum value can be configured based on number 
 of requests coming to server. Thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set

2015-05-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524330#comment-14524330
 ] 

Zhijie Shen commented on YARN-3422:
---

I think it a valid bug. I took a look at the patch, and it seems that you need 
to care care of relatedEntitiesWithoutStartTimes too. In addition, would you 
please add the test case to cover this?

/cc [~billie.rina...@gmail.com]

 relatedentities always return empty list when primary filter is set
 ---

 Key: YARN-3422
 URL: https://issues.apache.org/jira/browse/YARN-3422
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
  Labels: newbie
 Fix For: 2.6.1

 Attachments: YARN-3422.1.patch


 When you curl for ats entities with a primary filter, the relatedentities 
 fields always return empty list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3422) relatedentities always return empty list when primary filter is set

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3422:
--
 Component/s: timelineserver
Target Version/s: 2.7.1

 relatedentities always return empty list when primary filter is set
 ---

 Key: YARN-3422
 URL: https://issues.apache.org/jira/browse/YARN-3422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3422.1.patch


 When you curl for ats entities with a primary filter, the relatedentities 
 fields always return empty list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3422) relatedentities always return empty list when primary filter is set

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3422:
--
Fix Version/s: (was: 2.6.1)

 relatedentities always return empty list when primary filter is set
 ---

 Key: YARN-3422
 URL: https://issues.apache.org/jira/browse/YARN-3422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3422.1.patch


 When you curl for ats entities with a primary filter, the relatedentities 
 fields always return empty list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2469) Merge duplicated tests in Fifo/Capacity/Fair Scheduler into some common test

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2469:
-
Priority: Minor  (was: Major)

 Merge duplicated tests in Fifo/Capacity/Fair Scheduler into some common test
 

 Key: YARN-2469
 URL: https://issues.apache.org/jira/browse/YARN-2469
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Reporter: Junping Du
Priority: Minor

 From discussions in YARN-1506, there are duplicated test cases like: 
 testBlackListNode, testResourceOverCommit, etc. for different schedulers. We 
 need some common test code to cover the same test case for different 
 scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524387#comment-14524387
 ] 

Hadoop QA commented on YARN-2921:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  50m 51s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  68m  1s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12696827/YARN-2921.004.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / d3d019c |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7586/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7586/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7586/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7586/console |


This message was automatically generated.

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi Ozawa
 Attachments: YARN-2921.001.patch, YARN-2921.002.patch, 
 YARN-2921.003.patch, YARN-2921.004.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2482) DockerContainerExecutor configuration

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2482:
-
Issue Type: Sub-task  (was: New Feature)
Parent: YARN-2466

 DockerContainerExecutor configuration
 -

 Key: YARN-2482
 URL: https://issues.apache.org/jira/browse/YARN-2482
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abin Shahab
  Labels: security

 Currently DockerContainerExecutor can be configured from yarn-site.xml, and 
 users can add arbtrary arguments to the container launch command. This should 
 be fixed so that the cluster and other jobs are protected from malicious 
 string injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2482) DockerContainerExecutor configuration

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524386#comment-14524386
 ] 

Junping Du commented on YARN-2482:
--

Move it to under YARN-2466

 DockerContainerExecutor configuration
 -

 Key: YARN-2482
 URL: https://issues.apache.org/jira/browse/YARN-2482
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Abin Shahab
  Labels: security

 Currently DockerContainerExecutor can be configured from yarn-site.xml, and 
 users can add arbtrary arguments to the container launch command. This should 
 be fixed so that the cluster and other jobs are protected from malicious 
 string injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2137) Add support for logaggregation to a path on non-default filecontext

2015-05-01 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524511#comment-14524511
 ] 

Sumit Kumar commented on YARN-2137:
---

Apologies for the delay, i will rebase this patch and look into the required 
testing that [~vinodkv] recommended.

 Add support for logaggregation to a path on non-default filecontext
 ---

 Key: YARN-2137
 URL: https://issues.apache.org/jira/browse/YARN-2137
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Affects Versions: 2.4.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
 Attachments: YARN-2137.patch


 Current log-aggregation implementation supports logaggregation to default 
 filecontext only. This patch is to support logaggregation to any of the 
 supported filesystems within hadoop eco-system (hdfs, s3, swiftfs etc). So 
 for example a customer could use hdfs as default filesystem but use s3 or 
 swiftfs for logaggregation. Current implementation makes mixed usages of 
 FileContext+AbstractFileSystem apis as well as FileSystem apis which is 
 confusing.
 This patch does two things:
 # moves logaggregation implementation to use only FileContext apis
 # adds support for doing log aggregation on non-default filesystem as well.
 # changes TestLogAggregationService to use local filesystem itself instead of 
 mocking the behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1329) yarn-config.sh overwrites YARN_CONF_DIR indiscriminately

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524730#comment-14524730
 ] 

Hadoop QA commented on YARN-1329:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12613087/YARN-1329.patch |
| Optional Tests | shellcheck |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7595/console |


This message was automatically generated.

 yarn-config.sh overwrites YARN_CONF_DIR indiscriminately 
 -

 Key: YARN-1329
 URL: https://issues.apache.org/jira/browse/YARN-1329
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Reporter: Aaron Gottlieb
Assignee: haosdent
  Labels: easyfix
 Attachments: YARN-1329.patch


 The script yarn-daemons.sh calls 
 {code}${HADOOP_LIBEXEC_DIR}/yarn-config.sh{code}
 yarn-config.sh overwrites any previously set value of environment variable 
 YARN_CONF_DIR starting at line 40:
 {code:title=yarn-config.sh|borderStyle=solid}
 #check to see if the conf dir is given as an optional argument
 if [ $# -gt 1 ]
 then
 if [ --config = $1 ]
 then
 shift
 confdir=$1
 shift
 YARN_CONF_DIR=$confdir
 fi
 fi
  
 # Allow alternate conf dir location.
 export YARN_CONF_DIR=${HADOOP_CONF_DIR:-$HADOOP_YARN_HOME/conf}
 {code}
 The last line should check for the existence of YARN_CONF_DIR first.
 {code}
 DEFAULT_CONF_DIR=${HADOOP_CONF_DIR:-$YARN_HOME/conf}
 export YARN_CONF_DIR=${YARN_CONF_DIR:-$DEFAULT_CONF_DIR}
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524736#comment-14524736
 ] 

Hadoop QA commented on YARN-1019:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12616524/YARN-1019.0.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7596/console |


This message was automatically generated.

 YarnConfiguration validation for local disk path and http addresses.
 

 Key: YARN-1019
 URL: https://issues.apache.org/jira/browse/YARN-1019
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Omkar Vinit Joshi
Priority: Minor
  Labels: newbie
 Attachments: YARN-1019.0.patch


 Today we are not validating certain configuration parameters set in 
 yarn-site.xml. 1) Configurations related to paths... such as local-dirs, 
 log-dirs.. Our NM crashes during startup if they are set to relative paths 
 rather than absolute paths. To avoid such failures we can enforce checks 
 (absolute paths) before startup . i.e. before we actually startup...( i.e. 
 directory handler creating directories).
 2) Also for all the parameters using hostname:port unless we are ok with 
 default port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1725) RM should provide an easier way for the app to reject a bad allocation

2015-05-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1725:
--
Component/s: resourcemanager
 api
 Issue Type: Improvement  (was: Bug)

 RM should provide an easier way for the app to reject a bad allocation
 --

 Key: YARN-1725
 URL: https://issues.apache.org/jira/browse/YARN-1725
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Reporter: Bikas Saha

 Currently, if the app gets a bad allocation then it can release the 
 container. However, the app now needs to request those resources again or 
 else the RM will not give it a new container in lieu of the one just 
 rejected. This makes the app writers life hard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1969) Fair Scheduler: Add policy for Earliest Endtime First

2015-05-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1969:

Component/s: fairscheduler

 Fair Scheduler: Add policy for Earliest Endtime First
 -

 Key: YARN-1969
 URL: https://issues.apache.org/jira/browse/YARN-1969
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 What we are observing is that some big jobs with many allocated containers 
 are waiting for a few containers to finish. Under *fair-share scheduling* 
 however they have a low priority since there are other jobs (usually much 
 smaller, new comers) that are using resources way below their fair share, 
 hence new released containers are not offered to the big, yet 
 close-to-be-finished job. Nevertheless, everybody would benefit from an 
 unfair scheduling that offers the resource to the big job since the sooner 
 the big job finishes, the sooner it releases its many allocated resources 
 to be used by other jobs.In other words, we need a relaxed version of 
 *Earliest Endtime First scheduling*, that takes into account the number of 
 already-allocated resources and estimated time to finish.
 For example, if a job is using MEM GB of memory and is expected to finish in 
 TIME minutes, the priority in scheduling would be a function p of (MEM, 
 TIME). The expected time to finish can be estimated by the AppMaster using 
 TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource 
 request messages. To be less susceptible to the issue of apps gaming the 
 system, we can have this scheduling limited to leaf queues which have 
 applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1968) YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations.

2015-05-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1968:

Component/s: resourcemanager

 YARN Admin service should have more fine-grained ACL which is based on 
 mapping of users with methods/operations.
 

 Key: YARN-1968
 URL: https://issues.apache.org/jira/browse/YARN-1968
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Junping Du

 AdminService's operation today have different dimensions of management, some 
 are on user management while others are on cluster management, etc. 
 Today, we only check if user belongs to some authorized group to see if he 
 can execute operations in admin service. The result is who can either execute 
 all operations or none which is a simple strategy but not very precisely so 
 we cannot separate different management roles to several admins. We may need 
 more fine-grained ACLs which can authorized user with partial operations in 
 AdminService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2626) Document of timeline server needs to be updated

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2626.
---
Resolution: Duplicate

YARN-3539 is updating it. Close this one.

 Document of timeline server needs to be updated
 ---

 Key: YARN-2626
 URL: https://issues.apache.org/jira/browse/YARN-2626
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen

 YARN-2033, the document is no longer accurate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524176#comment-14524176
 ] 

Jian He commented on YARN-1735:
---

canceling the patch since it's not applying anymore. cc/ [~kasha]

 For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
 ---

 Key: YARN-1735
 URL: https://issues.apache.org/jira/browse/YARN-1735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Reporter: Siqi Li
 Attachments: YARN-1735.v1.patch


 in monitoring graphs the AvailableMB of each queue regularly spikes between 
 the AllocatedMB and the entire cluster capacity.
 This cannot be correct since AvailableMB should never be more than the queue 
 max allocation. The spikes are quite confusing since the availableMB is set 
 as the fair share of each queue and the fair share of each queue is bond by 
 their allowed max resource.
 Other than the spiking, the availableMB is always equal to allocatedMB. I 
 think this is not very useful, availableMB for each queue should be their 
 allowed max resource minus allocatedMB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524174#comment-14524174
 ] 

Hadoop QA commented on YARN-2892:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  52m 19s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 30s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12684732/YARN-2892.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3393461 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7577/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7577/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7577/console |


This message was automatically generated.

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524178#comment-14524178
 ] 

Hadoop QA commented on YARN-2345:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12663619/YARN-2345.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d3d019c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7580/console |


This message was automatically generated.

 yarn rmadmin -report
 

 Key: YARN-2345
 URL: https://issues.apache.org/jira/browse/YARN-2345
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Allen Wittenauer
Assignee: Hao Gao
  Labels: newbie
 Attachments: YARN-2345.1.patch


 It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-01 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3523:

Labels: newbie  (was: )

 Cleanup ResourceManagerAdministrationProtocol interface audience
 

 Key: YARN-3523
 URL: https://issues.apache.org/jira/browse/YARN-3523
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3523.20150422-1.patch


 I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
 class and @Public audience for methods. It doesn't make sense to me. We 
 should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1971) WindowsLocalWrapperScriptBuilder does not check for errors in generated script

2015-05-01 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1971:

Component/s: nodemanager

 WindowsLocalWrapperScriptBuilder does not check for errors in generated script
 --

 Key: YARN-1971
 URL: https://issues.apache.org/jira/browse/YARN-1971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Minor

 Similar to YARN-1865. The 
 DefaultContainerExecutor.WindowsLocalWrapperScriptBuilder builds a shell 
 script that contains commands that potentially may fail:
 {code}
 pout.println(@echo  + containerIdStr ++ normalizedPidFile +.tmp);
 pout.println(@move /Y  + normalizedPidFile + .tmp  + normalizedPidFile); 
 {code}
 These can fail due to access permissions, disc out of space, bad hardware, 
 cosmic rays etc etc. There should be proper error checking to ease 
 troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3380) Add protobuf compatibility checker to jenkins test runs

2015-05-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524232#comment-14524232
 ] 

Li Lu commented on YARN-3380:
-

Hi [~sidharta-s], thanks for poking. Right now my bandwidth is limited, so if 
any one would like to get this done soon and happen to have bandwidth on it, 
please feel free to let me know. Thanks!

 Add protobuf compatibility checker to jenkins test runs
 ---

 Key: YARN-3380
 URL: https://issues.apache.org/jira/browse/YARN-3380
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
  Labels: jenkins, scripting

 We may want to run the protobuf compatibility checker for each incoming 
 patch, to prevent incompatible changes for rolling upgrades. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled

2015-05-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524300#comment-14524300
 ] 

Wangda Tan commented on YARN-3006:
--

+1 for latest patch, committing.

 Improve the error message when attempting manual failover with auto-failover 
 enabled
 

 Key: YARN-3006
 URL: https://issues.apache.org/jira/browse/YARN-3006
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-3006.001.patch


 When executing manual failover with automatic failover enabled, 
 UnsupportedOperationException is thrown.
 {code}
 # yarn rmadmin -failover rm1 rm2
 Exception in thread main java.lang.UnsupportedOperationException: 
 RMHAServiceTarget doesn't have a corresponding ZKFC address
   at 
 org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51)
   at 
 org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94)
   at 
 org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311)
   at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622)
 {code}
 I'm thinking the above message is confusing to users. (Users may think 
 whether ZKFC is configured correctly...) The command should output error 
 message to stderr instead of throwing Exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2454) The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong.

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524327#comment-14524327
 ] 

Junping Du commented on YARN-2454:
--

Patch LGTM. Kick off Jenkins again, +1 based on Jenkins' results.

 The function compareTo of variable UNBOUNDED in 
 org.apache.hadoop.yarn.util.resource.Resources is definited wrong.
 --

 Key: YARN-2454
 URL: https://issues.apache.org/jira/browse/YARN-2454
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.4.1
Reporter: Xu Yang
Assignee: Xu Yang
 Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, 
 YARN-2454.patch


 The variable UNBOUNDED implement the abstract class Resources, and override 
 the function compareTo. But there is something wrong in this function. We 
 should not compare resources with zero as the same as the variable NONE. We 
 should change 0 to Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524373#comment-14524373
 ] 

Hadoop QA commented on YARN-20:
---

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  1s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| | |  36m 13s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12725385/YARN-20.1.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / 6f541ed |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7587/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7587/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7587/console |


This message was automatically generated.

 More information for yarn.resourcemanager.webapp.address in yarn-default.xml
 --

 Key: YARN-20
 URL: https://issues.apache.org/jira/browse/YARN-20
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Nemon Lou
Priority: Trivial
 Attachments: YARN-20.1.patch, YARN-20.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

   The parameter  yarn.resourcemanager.webapp.address in yarn-default.xml  is 
 in host:port format,which is noted in the cluster set up guide 
 (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html).
   When i read though the code,i find host format is also supported. In 
 host format,the port will be random.
   So we may add more documentation in  yarn-default.xml for easy understood.
   I will submit a patch if it's helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state

2015-05-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-2483.
--
  Resolution: Duplicate
Target Version/s:   (was: 2.6.0)

Resolve this JIRA as duplicated.

 TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to 
 incorrect AppAttempt state
 

 Key: YARN-2483
 URL: https://issues.apache.org/jira/browse/YARN-2483
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console :
 {code}
 testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
   Time elapsed: 49.686 sec   FAILURE!
 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402)
 {code}
 TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with 
 similar cause.
 These tests failed in build #664 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >