date:20150206


[ 
https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308817#comment-14308817
 ] 

Hudson commented on YARN-1537:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7038 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7038/])
YARN-1537. Fix race condition in 
TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. 
(acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
 -

 Key: YARN-1537
 URL: https://issues.apache.org/jira/browse/YARN-1537
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1537.1.patch


 Here is the error log
 {code}
 Results :
 Failed tests: 
   TestLocalResourcesTrackerImpl.testLocalResourceCache:351 
 Wanted but not invoked:
 eventHandler.handle(
 
 isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent)
 );
 - at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351)
 However, there were other interactions with this mock:
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309650#comment-14309650
 ] 

Jian He commented on YARN-3021:
---

bq. the JobClient will request the token from B cluster, but still specify the 
renewer as the A cluster RM (via the A cluster local config)
If this is the case, the assumption here is problematic, why would I request a 
token from B but let untrusted 3rd party A renew my token in the first place?

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY


[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309752#comment-14309752
 ] 

Hudson commented on YARN-2694:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7042 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7042/])
YARN-2694. Ensure only single node label specified in ResourceRequest. 
Contributed by Wangda Tan (jianhe: rev c1957fef29b07fea70938e971b30532a1e131fd0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java


 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
 YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
 YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
 YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
 YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
 YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
 YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
 YARN-2694-20150205-3.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest/host with 
 multiple node labels will make user limit, etc. computation becomes more 
 tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService
 - RMAdminCLI
 - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY


 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2694:
--
Target Version/s: 2.7.0  (was: 2.6.0)

 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
 YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
 YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
 YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
 YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
 YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
 YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
 YARN-2694-20150205-3.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest/host with 
 multiple node labels will make user limit, etc. computation becomes more 
 tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService
 - RMAdminCLI
 - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit

2015-02-06 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309865#comment-14309865
]

Wei Yan commented on YARN-3126:
---

[~Xia Hu], I checked the latest trunk version. The problem is still there.
Could u rebase a patch for the trunk? Normally we fix the problem in trunk,
instead of previous released version. And we may need to get YARN-2083
committed firstly.
Hey, [~kasha], do u have time look YARN-2083?

FairScheduler: queue's usedResource is always more than the maxResource limit
-

Key: YARN-3126
URL: https://issues.apache.org/jira/browse/YARN-3126
Project: Hadoop YARN
Issue Type: Bug
Components: fairscheduler
Affects Versions: 2.3.0
Environment: hadoop2.3.0. fair scheduler. spark 1.1.0.
Reporter: Xia Hu
Labels: assignContainer, fairscheduler, resources
Attachments: resourcelimit.patch

When submitting spark application(both spark-on-yarn-cluster and
spark-on-yarn-cleint model), the queue's usedResources assigned by
fairscheduler always can be more than the queue's maxResources limit.
And by reading codes of fairscheduler, I suppose this issue happened because
of ignore to check the request resources when assign Container.
Here is the detail:
1. choose a queue. In this process, it will check if queue's usedResource is
bigger than its max, with assignContainerPreCheck.
2. then choose a app in the certain queue.
3. then choose a container. And here is the question, there is no check
whether this container would make the queue sources over its max limit. If a
queue's usedResource is 13G, the maxResource limit is 16G, then a container
which asking for 4G resources may be assigned successful.
This problem will always happen in spark application, cause we can ask for
different container resources in different applications.
By the way, I have already use the patch from YARN-2083.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3120) YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good.

2015-02-06 Thread vaidhyanathan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309867#comment-14309867
]

vaidhyanathan commented on YARN-3120:
-

Hi Varun,

Thanks for responding. I started running the yarn cmd files by running as
administrator and it worked . Also i opened the command prompt and ran it in
the administrator mode.

The word count example worked fine for the first time but now im facing a
different issue , When i run it now with the earlier setup , job doesnt proceed
after this step '15/02/06 15:38:26 INFO mapreduce.Job: Running job:
job_1423255041751_0001' and when i check the consolde the status is 'Accepted'
and the final status is 'Undefined'

YarnException on windows +
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local
dirnm-local-dir, which was marked as good.
---

Key: YARN-3120
URL: https://issues.apache.org/jira/browse/YARN-3120
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.6.0
Environment: Windows 8 , Hadoop 2.6.0
Reporter: vaidhyanathan

Hi,
I tried to follow the instructiosn in
http://wiki.apache.org/hadoop/Hadoop2OnWindows and have setup
hadoop-2.6.0.jar in my windows system.
I was able to start everything properly but when i try to run the job
wordcount as given in the above URL , the job fails with the below exception .
15/01/30 12:56:09 INFO localizer.ResourceLocalizationService: Localizer failed
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local
di
r /tmp/hadoop-haremangala/nm-local-dir, which was marked as good.
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.
ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.
java:1372)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.
ResourceLocalizationService.access$900(ResourceLocalizationService.java:137)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.
ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java
:1085)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3041) create the ATS entity/event API


[ 
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309612#comment-14309612
 ] 

Sangjin Lee commented on YARN-3041:
---

[~rkanter], [~Naganarasimha], IMO it might make sense to define all YARN system 
entities as explicit types. It would include flow runs, YARN apps, app 
attempts, and containers. They have well-defined meaning and relationship, so 
it seems natural to me? Thoughts?

 create the ATS entity/event API
 ---

 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: YARN-3041.preliminary.001.patch


 Per design in YARN-2928, create the ATS entity and events API.
 Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
 flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-06 Thread Yongjun Zhang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309703#comment-14309703
]

Yongjun Zhang commented on YARN-3021:
-

Hi [~vinodkv] and [~jianhe],

Thank you so much for review and commenting!

I will try to respond to part of your comments here and keep looking into the
rest.

{quote}
RM can simply inspect the incoming renewer specified in the token and skip
renewing those tokens if the renewer doesn't match it's own address. This way,
we don't need an explicit API in the submission context.
{quote}
Seems regardless of this jira, we could have do the above change, right? any
catch?

{quote}
Apologies for going back and forth on this one.
{quote}
I appreciate the insight you provided, and we are trying to figure out the best
solution together. All the points you provided are reasonable, so absolutely no
need for apologies here.

{quote}
Irrespective of how we decide to skip tokens, the way the patch is skipping
renewal will not work. In secure mode, DelegationTokenRenewer drives the app
state machine. So if you skip adding the app itself to DTR, the app will be
completely
{quote}
I did test in a secure env and it worked. Would you please elaborate?

{quote}
I think in this case, the renewer specified in the token is the same as the RM.
IIUC, the JobClient will request the token from B cluster, but still specify
the renewer as the A cluster RM (via the A cluster local config), am I right?
{quote}
I think that's the case. The problem is that there is no trust between A and B.
So common should be the one to renew the token.

Thanks.

YARN's delegation-token handling disallows certain trust setups to operate
properly
---

Key: YARN-3021
URL: https://issues.apache.org/jira/browse/YARN-3021
Project: Hadoop YARN
Issue Type: Bug
Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Attachments: YARN-3021.001.patch, YARN-3021.002.patch,
YARN-3021.003.patch, YARN-3021.patch

Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON,
and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN
clusters.
Now if one logs in with a COMMON credential, and runs a job on A's YARN that
needs to access B's HDFS (such as a DistCp), the operation fails in the RM,
as it attempts a renewDelegationToken(…) synchronously during application
submission (to validate the managed token before it adds it to a scheduler
for automatic renewal). The call obviously fails cause B realm will not trust
A's credentials (here, the RM's principal is the renewer).
In the 1.x JobTracker the same call is present, but it is done asynchronously
and once the renewal attempt failed we simply ceased to schedule any further
attempts of renewals, rather than fail the job immediately.
We should change the logic such that we attempt the renewal but go easy on
the failure and skip the scheduling alone, rather than bubble back an error
to the client, failing the app submission. This way the old behaviour is
retained.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits


 [ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-281.
-
  Resolution: Won't Fix
Release Note: 
I think this may not need since we already have tests in TestSchedulerUitls, it 
will verify minimum/maximum resource normalization/verification. And 
SchedulerUtil runs before scheduler can see such resource requests.

Resolved it as won't fix.

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3100) Make YARN authorization pluggable

2015-02-06 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309909#comment-14309909
 ] 

Chris Douglas commented on YARN-3100:
-

Agreed; definitely a separate JIRA. As state is copied from the old queues, 
some of the methods called in {{CSQueueUtils}} throw exceptions, similar to the 
case you found in {{LeafQueue}}.

 Make YARN authorization pluggable
 -

 Key: YARN-3100
 URL: https://issues.apache.org/jira/browse/YARN-3100
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3100.1.patch, YARN-3100.2.patch


 The goal is to have YARN acl model pluggable so as to integrate other 
 authorization tool such as Apache Ranger, Sentry.
 Currently, we have 
 - admin ACL
 - queue ACL
 - application ACL
 - time line domain ACL
 - service ACL
 The proposal is to create a YarnAuthorizationProvider interface. Current 
 implementation will be the default implementation. Ranger or Sentry plug-in 
 can implement  this interface.
 Benefit:
 -  Unify the code base. With the default implementation, we can get rid of 
 each specific ACL manager such as AdminAclManager, ApplicationACLsManager, 
 QueueAclsManager etc.
 - Enable Ranger, Sentry to do authorization for YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal


[ 
https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309816#comment-14309816
 ] 

Hadoop QA commented on YARN-3144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697097/YARN-3144.4.patch
  against trunk revision eaab959.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6537//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6537//console

This message is automatically generated.

 Configuration for making delegation token failures to timeline server 
 not-fatal
 ---

 Key: YARN-3144
 URL: https://issues.apache.org/jira/browse/YARN-3144
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, 
 YARN-3144.4.patch


 Posting events to the timeline server is best-effort. However, getting the 
 delegation tokens from the timeline server will kill the job. This patch adds 
 a configuration to make get delegation token operations best-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal


[ 
https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309886#comment-14309886
 ] 

Jason Lowe commented on YARN-3144:
--

Committing this.  The test failures appear to be unrelated, and they both pass 
for me locally with the patch applied.

 Configuration for making delegation token failures to timeline server 
 not-fatal
 ---

 Key: YARN-3144
 URL: https://issues.apache.org/jira/browse/YARN-3144
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, 
 YARN-3144.4.patch


 Posting events to the timeline server is best-effort. However, getting the 
 delegation tokens from the timeline server will kill the job. This patch adds 
 a configuration to make get delegation token operations best-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1142) MiniYARNCluster web ui does not work properly


[ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309663#comment-14309663
 ] 

Sangjin Lee commented on YARN-1142:
---

Some more info on this at 
https://issues.apache.org/jira/browse/YARN-3087?focusedCommentId=14307614page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14307614

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.7.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup


[ 
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309920#comment-14309920
 ] 

Jason Lowe commented on YARN-2809:
--

+1 lgtm.  Will commit this early next week if there are no objections.

 Implement workaround for linux kernel panic when removing cgroup
 

 Key: YARN-2809
 URL: https://issues.apache.org/jira/browse/YARN-2809
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment:  RHEL 6.4
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch


 Some older versions of linux have a bug that can cause a kernel panic when 
 the LCE attempts to remove a cgroup. It is a race condition so it's a bit 
 rare but on a few thousand node cluster it can result in a couple of panics 
 per day.
 This is the commit that likely (haven't verified) fixes the problem in linux: 
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
 Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

[
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309639#comment-14309639
]

Jian He commented on YARN-3021:
---

bq. Explicitly have an external renewer system that has the right permissions
to renew these tokens.
I think this is the correct long-term solution. RM today happens to be the
renewer. But we need a central renewer component so that we can do
cross-cluster renewals.
bq. RM can simply inspect the incoming renewer specified in the token and skip
renewing those tokens if the renewer doesn't match it's own address
I think in this case, the renewer specified in the token is the same as the RM.
IIUC, the JobClient will request the token from B cluster, but still specify
the renewer as the A cluster RM (via the A cluster local config), am I right?

YARN's delegation-token handling disallows certain trust setups to operate
properly
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3153) Capacity Scheduler max AM resource percentage is mis-used as ratio

Wangda Tan created YARN-3153:


 Summary: Capacity Scheduler max AM resource percentage is mis-used 
as ratio
 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical


In existing Capacity Scheduler, it can limit max applications running within a 
queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but 
actually, it is used as ratio, in implementation, it assumes input will be 
\[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of 
queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309973#comment-14309973
 ] 

Xuan Gong commented on YARN-3154:
-

We can add a parameter in logAggregationContext and indicate whether this app 
is LRS app. Based on this flag, the NM can decide whether it need to upload the 
partial logs for this app

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker

 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio


[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310023#comment-14310023
 ] 

Jian He commented on YARN-3153:
---

As the 
[doc|http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html]
  already explicitly mentions specified as float, to keep it compatible, we 
may choose to do 1)

 Capacity Scheduler max AM resource limit for queues is defined as percentage 
 but used as ratio
 --

 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical

 In existing Capacity Scheduler, it can limit max applications running within 
 a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
 but actually, it is used as ratio, in implementation, it assumes input will 
 be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
 of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3100) Make YARN authorization pluggable


 [ 
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3100:
--
Issue Type: Improvement  (was: Bug)

 Make YARN authorization pluggable
 -

 Key: YARN-3100
 URL: https://issues.apache.org/jira/browse/YARN-3100
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3100.1.patch, YARN-3100.2.patch


 The goal is to have YARN acl model pluggable so as to integrate other 
 authorization tool such as Apache Ranger, Sentry.
 Currently, we have 
 - admin ACL
 - queue ACL
 - application ACL
 - time line domain ACL
 - service ACL
 The proposal is to create a YarnAuthorizationProvider interface. Current 
 implementation will be the default implementation. Ranger or Sentry plug-in 
 can implement  this interface.
 Benefit:
 -  Unify the code base. With the default implementation, we can get rid of 
 each specific ACL manager such as AdminAclManager, ApplicationACLsManager, 
 QueueAclsManager etc.
 - Enable Ranger, Sentry to do authorization for YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3041) create the ATS entity/event API


[ 
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310026#comment-14310026
 ] 

Zhijie Shen commented on YARN-3041:
---

bq. IMO it might make sense to define all YARN system entities as explicit types

Make sense to me.

 create the ATS entity/event API
 ---

 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: YARN-3041.preliminary.001.patch


 Per design in YARN-2928, create the ATS entity and events API.
 Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
 flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI


[ 
https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310033#comment-14310033
 ] 

Hadoop QA commented on YARN-1126:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12697127/YARN-905-addendum.patch
  against trunk revision 5c79439.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6540//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6540//console

This message is automatically generated.

 Add validation of users input nodes-states options to nodes CLI
 ---

 Key: YARN-1126
 URL: https://issues.apache.org/jira/browse/YARN-1126
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch


 Follow the discussion in YARN-905.
 (1) case-insensitive checks for all.
 (2) validation of users input, exit with non-zero code and print all valid 
 states when user gives an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310056#comment-14310056
]

Zhijie Shen commented on YARN-2928:
---

bq. A single tez application can run multiple different Hive queries submitted
by different users.

In this use case, who is the user of the TEZ application? This may affect the
data mode and the parent-child relationship (cluster-user-flow-flow
run-application).

bq. Where does the current implementation's otherInfo and primaryFilters
fit in?

metadata aims to store the same thing as otherInfo, but I didn't want to be
called otherinfo because it's no longer the other info than primaryFilters.
When making the new schema, I'm looking for the option to have the entity
indexed, but don't need to explicitly specify what is the primaryFilters,
which makes trouble and bugs when updating the entity before.

bq. What are the main differences between meta-data and configuration?

It may be combined, as I consider both are key-value pairs, but I distinguish
them explicitly for better usage. Or is there any special access pattern for
config?

bq. If there is a hierarchy of objects, will there be support to listen to or
retrieve all events for a given tree by providing a root node?

We may probably run adhoc query to get the events of all applications of a
workflow.

bq. What use are events? Will there be a streaming API available to listen to
all events based on some search criteria?
bq. In certain cases, it might be required to mine a specific job's data by
exporting contents out of ATS.

They sound to be interesting features, but we may not able to accommodate them
within Hadoop 2.8 timeline.

Application Timeline Server (ATS) next gen: phase 1
---

Key: YARN-2928
URL: https://issues.apache.org/jira/browse/YARN-2928
Project: Hadoop YARN
Issue Type: New Feature
Components: timelineserver
Reporter: Sangjin Lee
Priority: Critical
Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal
v1.pdf

We have the application timeline server implemented in yarn per YARN-1530 and
YARN-321. Although it is a great feature, we have recognized several critical
issues and features that need to be addressed.
This JIRA proposes the design and implementation changes to address those.
This is phase 1 of this effort.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio


[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309974#comment-14309974
 ] 

Wangda Tan commented on YARN-3153:
--

We have 3 options basically, 
1) Keep the config name (...percentage) and continue use it as ratio, add 
additional checking for this to make sure it fit in range \[0,1\]
2) Keep the config name. Use it as percentage, this need update yarn-default as 
well. This will have some impacts on existing deployments if they upgrade.
3) Change the config name to (...ratio), this will be a in-compatible change.

Thoughts? [~vinodkv], [~jianhe]

 Capacity Scheduler max AM resource limit for queues is defined as percentage 
 but used as ratio
 --

 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical

 In existing Capacity Scheduler, it can limit max applications running within 
 a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
 but actually, it is used as ratio, in implementation, it assumes input will 
 be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
 of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio


[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310032#comment-14310032
 ] 

Wangda Tan commented on YARN-3153:
--

Thanks for your feedbacks, I agree to do 1) first. I think 
deprecating+change-name is not so graceful enough, user will get confused when 
he found one option deprecated but system suggest to use a very similar one.

Will upload a patch for #1 shortly.

 Capacity Scheduler max AM resource limit for queues is defined as percentage 
 but used as ratio
 --

 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical

 In existing Capacity Scheduler, it can limit max applications running within 
 a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
 but actually, it is used as ratio, in implementation, it assumes input will 
 be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
 of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method

2015-02-06 Thread Li Lu (JIRA)

Li Lu created YARN-3155:
---

 Summary: Refactor the exception handling code for 
TimelineClientImpl's retryOn method
 Key: YARN-3155
 URL: https://issues.apache.org/jira/browse/YARN-3155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
Priority: Minor


Since we switched to Java 1.7, the exception handling code for the retryOn 
method can be merged into one statement block, instead of the current two, to 
avoid repeated code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3156) Allow RM timeline client renewDelegation exceptions to be non-fatal

2015-02-06 Thread Jonathan Eagles (JIRA)

Jonathan Eagles created YARN-3156:
-

 Summary: Allow RM timeline client renewDelegation exceptions to be 
non-fatal
 Key: YARN-3156
 URL: https://issues.apache.org/jira/browse/YARN-3156
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


This is a follow-up to YARN-3144. In addition to YarnClientImpl, delegation 
token renew may also fail after client has successfully retrieved delegation 
token. This jira is to allow the exception generated in 
TimelineDelegationTokenIdentifier to be non-fatal if the RM has configured the 
yarn.timeline-service.client.best-effort flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-06 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.prelim.patch

Preliminary patch to calculate CPU VCores used for ProcfsBasedProcessTree based 
on proc/pid/stat values. Remaining work is WindowsProcessTree calculation and 
unit tests.
Tested by using the main method for ProcfsBasedProcessTree

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309986#comment-14309986
 ] 

Jason Lowe commented on YARN-3154:
--

Note that even LRS apps have issues if they don't do their own log rolling.  If 
I remember correctly, stdout and stderr files are setup by the container 
executor, and we'll have partial logs uploaded then deleted from the local 
filesystem, losing any subsequent logs to these files or any other files that 
aren't explicitly log rolled and filtered via a log aggregation context.

IMHO we need to make sure we do _not_ delete anything for a running app 
_unless_ it has a log aggregation context filter to tell us what is safe to 
upload and delete.  Without that information, we cannot tell if a log file is 
live and therefore going to be deleted too early.

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker

 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields


[ 
https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309949#comment-14309949
 ] 

Jason Lowe commented on YARN-3143:
--

Thanks for the review, Kihwal!  Committing this.

 RM Apps REST API can return NPE or entries missing id and other fields
 --

 Key: YARN-3143
 URL: https://issues.apache.org/jira/browse/YARN-3143
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.2
Reporter: Kendall Thrapp
Assignee: Jason Lowe
 Attachments: YARN-3143.001.patch


 I'm seeing intermittent null pointer exceptions being returned by
 the YARN Apps REST API.
 For example:
 {code}
 http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
 {code}
 JSON Response was:
 {code}
 {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}}
 {code}
 At a glance appears to be only when we query for unfinished apps (i.e. 
 finalStatus=UNDEFINED).  
 Possibly related, when I do get back a list of apps, sometimes one or more of 
 the apps will be missing most of the fields, like id, name, user, etc., and 
 the fields that are present all have zero for the value.  
 For example:
 {code}
 {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0}
 {code}
 Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications

2015-02-06 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310042#comment-14310042
 ] 

Vinod Kumar Vavilapalli commented on YARN-3154:
---

Does having two separate notions work?
 - Today's LogAggregationContext's include/exclude patterns for the app to 
indicate which log files need to be aggregated explicitly at app finish. This 
works for regular apps.
 - A new include/exclude pattern for app to indicate which log files need to be 
aggregated in a rolling fashion.

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker

 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio


[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310049#comment-14310049
 ] 

Wangda Tan commented on YARN-3153:
--

Good suggestion, I think we can deprecate the precent one, make sure its value 
within \[0, 1\], and use a ratio/factor as the new option name. Sounds good?

 Capacity Scheduler max AM resource limit for queues is defined as percentage 
 but used as ratio
 --

 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical

 In existing Capacity Scheduler, it can limit max applications running within 
 a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
 but actually, it is used as ratio, in implementation, it assumes input will 
 be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
 of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits


 [ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-281:

Release Note:   (was: I think this may not need since we already have tests 
in TestSchedulerUitls, it will verify minimum/maximum resource 
normalization/verification. And SchedulerUtil runs before scheduler can see 
such resource requests.

Resolved it as won't fix.)

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits


[ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310099#comment-14310099
 ] 

Wangda Tan commented on YARN-281:
-

Just missed putting comment to release note, cleaned release note.

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits


[ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310100#comment-14310100
 ] 

Wangda Tan commented on YARN-281:
-

I think this may not need since we already have tests in TestSchedulerUitls, it 
will verify minimum/maximum resource normalization/verification. And 
SchedulerUtil runs before scheduler can see such resource requests.

Resolved it as won't fix.

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser

2015-02-06 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309952#comment-14309952
 ] 

Vinod Kumar Vavilapalli commented on YARN-3089:
---

bq. Currently, even we are running a MR job, it will upload the partial logs 
which does not sound right. And we need to fix it.
Wow, this is a huge blocker. We should fix it in 2.6.1. [~xgong], can you 
please file a ticket and link it here? Tx.

 LinuxContainerExecutor does not handle file arguments to deleteAsUser
 -

 Key: YARN-3089
 URL: https://issues.apache.org/jira/browse/YARN-3089
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Eric Payne
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt, YARN-3089.v3.txt


 YARN-2468 added the deletion of individual logs that are aggregated, but this 
 fails to delete log files when the LCE is being used.  The LCE native 
 executable assumes the paths being passed are paths and the delete fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests

2015-02-06 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310034#comment-14310034
 ] 

Sandy Ryza commented on YARN-2990:
--

+1.  Sorry for the delay in getting to this.

 FairScheduler's delay-scheduling always waits for node-local and rack-local 
 delays, even for off-rack-only requests
 ---

 Key: YARN-2990
 URL: https://issues.apache.org/jira/browse/YARN-2990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2990-0.patch, yarn-2990-1.patch, yarn-2990-2.patch, 
 yarn-2990-test.patch


 Looking at the FairScheduler, it appears the node/rack locality delays are 
 used for all requests, even those that are only off-rack. 
 More details in comments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields


[ 
https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310055#comment-14310055
 ] 

Hudson commented on YARN-3143:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7045 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7045/])
YARN-3143. RM Apps REST API can return NPE or entries missing id and other 
fields. Contributed by Jason Lowe (jlowe: rev 
da2fb2bc46bddf42d79c6d7664cbf0311973709e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java


 RM Apps REST API can return NPE or entries missing id and other fields
 --

 Key: YARN-3143
 URL: https://issues.apache.org/jira/browse/YARN-3143
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.2
Reporter: Kendall Thrapp
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-3143.001.patch


 I'm seeing intermittent null pointer exceptions being returned by
 the YARN Apps REST API.
 For example:
 {code}
 http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
 {code}
 JSON Response was:
 {code}
 {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}}
 {code}
 At a glance appears to be only when we query for unfinished apps (i.e. 
 finalStatus=UNDEFINED).  
 Possibly related, when I do get back a list of apps, sometimes one or more of 
 the apps will be missing most of the fields, like id, name, user, etc., and 
 the fields that are present all have zero for the value.  
 For example:
 {code}
 {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0}
 {code}
 Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-02-06 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310114#comment-14310114
]

Hitesh Shah commented on YARN-2928:
---

bq. In this use case, who is the user of the TEZ application? This may affect
the data mode and the parent-child relationship (cluster-user-flow-flow
run-application).
When you say user, what does it really imply? User a can submit a hive query.
A tez application running as user hive can execute the query submitted by
user a using a's delegation tokens. With proxy users and potential use of
delegation tokens, which user should be used?

bq. metadata aims to store the same thing as otherInfo, ...
primaryFilters
Seems like a good option. What form of search will be supported? In most cases,
values will unlikely be primitive types but deep nested structures. Will you
support all forms of search on all objects?

bq. They sound to be interesting features, ..
My point related to events was not about a new interesting feature but to
generally understand what use case is meant to be solved by events and how
should an application developer use events?

bq. We may probably run adhoc query to get the events of all applications of a
workflow.
How is a workflow defined when an entity has 2 parents? Considering the
tez-hive example, do you agree that both a Hive Query and a Tez application are
workflows and share some entities?

Application Timeline Server (ATS) next gen: phase 1
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields

2015-02-06 Thread Kendall Thrapp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310136#comment-14310136
 ] 

Kendall Thrapp commented on YARN-3143:
--

Thanks [~jlowe] for debugging and the super quick patch and thanks [~eepayne] 
and [~kihwal] for reviewing.

 RM Apps REST API can return NPE or entries missing id and other fields
 --

 Key: YARN-3143
 URL: https://issues.apache.org/jira/browse/YARN-3143
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.2
Reporter: Kendall Thrapp
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-3143.001.patch


 I'm seeing intermittent null pointer exceptions being returned by
 the YARN Apps REST API.
 For example:
 {code}
 http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
 {code}
 JSON Response was:
 {code}
 {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}}
 {code}
 At a glance appears to be only when we query for unfinished apps (i.e. 
 finalStatus=UNDEFINED).  
 Possibly related, when I do get back a list of apps, sometimes one or more of 
 the apps will be missing most of the fields, like id, name, user, etc., and 
 the fields that are present all have zero for the value.  
 For example:
 {code}
 {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0}
 {code}
 Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-06 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.prelim.patch

[ ]# stress -c 3 
[1] 1778
[ ]# stress: info: [1778] dispatching hogs: 3 cpu, 0 io, 0 vm, 0 hdd
[ ]# top -n 1  -p 1778 -p 1779 -p 1780 -p 1781  | grep stress
 1779 root  20   0  6516  192  100 R 99.8  0.0   1:35.90 stress
 1780 root  20   0  6516  192  100 R 99.8  0.0   1:36.04 stress
 1781 root  20   0  6516  192  100 R 99.8  0.0   1:35.87 stress
 1778 root  20   0  6516  556  468 S  0.0  0.0   0:00.00 stress
 
 [ ]# java   org.apache.hadoop.yarn.util.ProcfsBasedProcessTree 1779
 Number of processors 4
 Creating ProcfsBasedProcessTree for process 1779
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1779 1778 1778 595 (stress) 59492 7 6672384 48 stress -c 3

 Get cpu usage -1.0
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1779 1778 1778 595 (stress) 59553 7 6672384 48 stress -c 3

 Get cpu usage 24.091627
 [ ]# java   org.apache.hadoop.yarn.util.ProcfsBasedProcessTree 1778
 Number of processors 4
 Creating ProcfsBasedProcessTree for process 1778
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1779 1778 1778 595 (stress) 60692 8 6672384 48 stress -c 3
|- 1781 1778 1778 595 (stress) 60741 6 6672384 48 stress -c 3
|- 1780 1778 1778 595 (stress) 60729 5 6672384 48 stress -c 3
|- 1778 628 1778 595 (stress) 0 0 6672384 139 stress -c 3

 Get cpu usage -1.0
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1779 1778 1778 595 (stress) 60750 8 6672384 48 stress -c 3
|- 1781 1778 1778 595 (stress) 60801 6 6672384 48 stress -c 3
|- 1780 1778 1778 595 (stress) 60786 5 6672384 48 stress -c 3
|- 1778 628 1778 595 (stress) 0 0 6672384 139 stress -c 3

 Get cpu usage 72.553894

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.prelim.patch, YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2796) deprecate sbin/*.sh


[ 
https://issues.apache.org/jira/browse/YARN-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310246#comment-14310246
 ] 

Hadoop QA commented on YARN-2796:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697160/YARN-2796-00.patch
  against trunk revision da2fb2b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6541//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6541//console

This message is automatically generated.

 deprecate sbin/*.sh
 ---

 Key: YARN-2796
 URL: https://issues.apache.org/jira/browse/YARN-2796
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
 Attachments: YARN-2796-00.patch


 We should deprecate mark all yarn sbin/*.sh commands (except for start and 
 stop) as deprecated in trunk so that they may be removed in a future release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice


[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310265#comment-14310265
 ] 

Hadoop QA commented on YARN-2246:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697162/YARN-2246.patch
  against trunk revision da2fb2b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6542//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6542//console

This message is automatically generated.

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.7.0

 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
 YARN-2246.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API


[ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310281#comment-14310281
 ] 

Hadoop QA commented on YARN-1050:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656239/YARN-1050-3.patch
  against trunk revision da2fb2b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6543//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6543//console

This message is automatically generated.

 Document the Fair Scheduler REST API
 

 Key: YARN-1050
 URL: https://issues.apache.org/jira/browse/YARN-1050
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
 Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch


 The documentation should be placed here along with the Capacity Scheduler 
 documentation: 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310278#comment-14310278
 ] 

Xuan Gong commented on YARN-3154:
-

bq. Does having two separate notions work?

This should work. But require the API changes.

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker

 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310242#comment-14310242
]

Sangjin Lee commented on YARN-2928:
---

[~hitesh], continuing that discussion,

{quote}
[~vinodkv] Should have probably added more context from the design doc:
We assume that the failure semantics of the ATS writer companion is the same
as the AM. If the ATS writer companion fails for any reason, we try to bring it
back up up to a specified number of times. If the maximum retries are
exhausted, we consider it a fatal failure, and fail the application.
{quote}

Yes, I definitely could add more color to that point. I'm going to update the
design doc as there are a number of clarifications made. Hopefully some time
next week.

In the per-app timeline aggregator (a.k.a. ATS writer companion) model, it is a
special container. And we need to be able to allocate both the timeline
aggregator and the AM or neither. Also, we do want to be able to co-locate the
AM and the aggregator on the same node. Then RM needs to negotiate that
combined capacity atomically. In other words, we don't want to have a situation
where we were able to allocate ATS but not AM, or vice versa. If AM needs 2 G,
and the timeline aggregator needs 1 G, then this pair needs to go to a node on
which 3 G can be allocated at that time.

In terms of the failure scenarios, we may need to hash out some more details.
Since allocation is considered as a pair, it is also natural to consider their
failure semantics in the same manner. But a deeper question is, if the AM came
up but the timeline aggregator didn't come up (for resource reasons or
otherwise), do we consider that an acceptable situation? If the timeline
aggregator for that app cannot come up, should that be considered fatal? Or, if
apps are running but they're not logging critical lifecycle events, etc.
because the timeline aggregator went down, do we consider that situation
acceptable? The discussion was that it is probably not acceptable as if it is a
common occurrence, it would leave a large hole in the collected timeline data
and the overall value of the timeline data goes down significantly.

That said, this point is deferred somewhat because initially we're starting out
with a per-node aggregator option. The per-node aggregator option somewhat
sidesteps (but not completely) this issue.

Application Timeline Server (ATS) next gen: phase 1
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3100) Make YARN authorization pluggable


 [ 
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3100:
--
Attachment: (was: YARN-3100.2.patch)

 Make YARN authorization pluggable
 -

 Key: YARN-3100
 URL: https://issues.apache.org/jira/browse/YARN-3100
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3100.1.patch, YARN-3100.2.patch


 The goal is to have YARN acl model pluggable so as to integrate other 
 authorization tool such as Apache Ranger, Sentry.
 Currently, we have 
 - admin ACL
 - queue ACL
 - application ACL
 - time line domain ACL
 - service ACL
 The proposal is to create a YarnAuthorizationProvider interface. Current 
 implementation will be the default implementation. Ranger or Sentry plug-in 
 can implement  this interface.
 Benefit:
 -  Unify the code base. With the default implementation, we can get rid of 
 each specific ACL manager such as AdminAclManager, ApplicationACLsManager, 
 QueueAclsManager etc.
 - Enable Ranger, Sentry to do authorization for YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice


[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310260#comment-14310260
 ] 

Zhijie Shen commented on YARN-2246:
---

bq. For running applications 'res' needs to be appended to 'trackingUri' 
because it is trying to load the files like

I see. But do you know why we use proxyLink for running app instead of redirect 
the request?

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.7.0

 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
 YARN-2246.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2015-02-06 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310328#comment-14310328
 ] 

Carlo Curino commented on YARN-1039:


Tossing some fire back on duration. I read your concerns of applications 
ability to provide good values, 
however, I'd rather have the app providing their best duration estimate (and 
the framework rounding it 
or bucketing it), than the app providing a coarse grained tag-based version in 
the first place. 

Changing cluster configurations and policies might turn what used to be a 
short task into something 
not that short, which we want to handle differently and so on. In a sense 
asking for duration prevent 
us to rely on what application will judge as short/long etc.. 

As another example, based on whatever mechanisms for log aggregation we will 
have in the future, 
we can change our mind about what are the cut-points for short/long etc.. For 
example, because a
new technique makes it very cheap and we want to provide much more frequent 
feedback to users.

Bottom line, I find duration a rather neutral thing to ask, vs something 
which is more opinion-based,
and corner cases like never-ending services are easily handled with -1 or +inf 
values.

I also agree that there are many other use cases for tags, that emerged in the 
discussion, which have
a clear value and are by no means covered by duration.



 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-06 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-3021:
--
Summary: YARN's delegation-token handling disallows certain trust setups to 
operate properly over DistCp  (was: YARN's delegation-token handling disallows 
certain trust setups to operate properly)

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-06 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310570#comment-14310570
]

Harsh J commented on YARN-3021:
---

[~vinodkv],

Many thanks for the response here!

bq. Though the patch unblocks the jobs in the short term, it seems like long
term this is still bad.

I agree in that it does not resolve the problem. The goal we're seeking is also
short-term, in that of bringing back a behaviour that got allowed on MR1, in
MR2 - even though both end up facing the same issue.

The longer term approach sounds like the most optimal thing to do for proper
resolution, but given some users are getting blocked by this behaviour change
I'd like to know if there'll be any objections in adding the current approach
as an interim-fix (the doc for the property does/will claim it disables several
necessary features of the job), and file subsequent JIRAs for implementing the
standalone renewer?

bq. Irrespective of how we decide to skip tokens, the way the patch is skipping
renewal will not work. In secure mode, DelegationTokenRenewer drives the app
state machine. So if you skip adding the app itself to DTR, the app will be
completely stuck.

In our simple tests the app did run through successfully with such an approach,
but there was multiple factors we did not test for (app recovery, task
failures, etc. which could be impacted). Would it be better if we added in a
morphed DelegationTokenRenewer (which does NOP as part of actual renewal
logic), instead of skipping adding in the renewer completely?

YARN's delegation-token handling disallows certain trust setups to operate
properly
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310250#comment-14310250
 ] 

Sangjin Lee commented on YARN-2928:
---

[~rajesh.balamohan]:

bq. In certain cases, it might be required to mine a specific job's data by 
exporting contents out of ATS. Would there be any support for an export tool to 
get data out of ATS?

Other than access to the REST endpoint, one might be able to query the backing 
storage directly. And we're keeping that in mind. But that would depend on the 
backing storage's capability. For example, for HBase, we could provide phoenix 
schema on which one can do offline queries pretty efficiently.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3100) Make YARN authorization pluggable


[ 
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310315#comment-14310315
 ] 

Zhijie Shen commented on YARN-3100:
---

Thanks for the last patch. It looks good to me. Pending the commit to give 
Chris some time to feedback.

 Make YARN authorization pluggable
 -

 Key: YARN-3100
 URL: https://issues.apache.org/jira/browse/YARN-3100
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3100.1.patch, YARN-3100.2.patch


 The goal is to have YARN acl model pluggable so as to integrate other 
 authorization tool such as Apache Ranger, Sentry.
 Currently, we have 
 - admin ACL
 - queue ACL
 - application ACL
 - time line domain ACL
 - service ACL
 The proposal is to create a YarnAuthorizationProvider interface. Current 
 implementation will be the default implementation. Ranger or Sentry plug-in 
 can implement  this interface.
 Benefit:
 -  Unify the code base. With the default implementation, we can get rid of 
 each specific ACL manager such as AdminAclManager, ApplicationACLsManager, 
 QueueAclsManager etc.
 - Enable Ranger, Sentry to do authorization for YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications

Xuan Gong created YARN-3154:
---

 Summary: Should not upload partial logs for MR jobs or other 
short-running' applications 
 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker


Currently, if we are running a MR job, and we do not set the log interval 
properly, we will have their partial logs uploaded and then removed from the 
local filesystem which is not right.

We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields


[ 
https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310004#comment-14310004
 ] 

Jason Lowe commented on YARN-3143:
--

My apologies, I also meant to thank Eric for the original review!

 RM Apps REST API can return NPE or entries missing id and other fields
 --

 Key: YARN-3143
 URL: https://issues.apache.org/jira/browse/YARN-3143
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.2
Reporter: Kendall Thrapp
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-3143.001.patch


 I'm seeing intermittent null pointer exceptions being returned by
 the YARN Apps REST API.
 For example:
 {code}
 http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
 {code}
 JSON Response was:
 {code}
 {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}}
 {code}
 At a glance appears to be only when we query for unfinished apps (i.e. 
 finalStatus=UNDEFINED).  
 Possibly related, when I do get back a list of apps, sometimes one or more of 
 the apps will be missing most of the fields, like id, name, user, etc., and 
 the fields that are present all have zero for the value.  
 For example:
 {code}
 {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0}
 {code}
 Let me know if there's any other information I can provide to help debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio

2015-02-06 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310029#comment-14310029
 ] 

Vinod Kumar Vavilapalli commented on YARN-3153:
---

This is a hard one to solve.

+1 for option (1) for now. In addition to that, we can chose to deprecate this 
configuration completely and introduce a new one with the right semantics but 
with a name-change: say yarn.scheduler.capacity.maximum-am-resources-percentage.

 Capacity Scheduler max AM resource limit for queues is defined as percentage 
 but used as ratio
 --

 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical

 In existing Capacity Scheduler, it can limit max applications running within 
 a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
 but actually, it is used as ratio, in implementation, it assumes input will 
 be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
 of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method

2015-02-06 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3155:

Attachment: YARN-3155-020615.patch

In this patch I refactored the catch blocks in TimelineClientConnectionRetry's 
retryOn method. I used the Java 1.7's new catch block to eliminate the repeated 
exception handling code for two types of exceptions. 

 Refactor the exception handling code for TimelineClientImpl's retryOn method
 

 Key: YARN-3155
 URL: https://issues.apache.org/jira/browse/YARN-3155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
Priority: Minor
  Labels: refactoring
 Attachments: YARN-3155-020615.patch


 Since we switched to Java 1.7, the exception handling code for the retryOn 
 method can be merged into one statement block, instead of the current two, to 
 avoid repeated code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3152) Missing hadoop exclude file fails RMs in HA

2015-02-06 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3152:
---

Assignee: Naganarasimha G R

 Missing hadoop exclude file fails RMs in HA
 ---

 Key: YARN-3152
 URL: https://issues.apache.org/jira/browse/YARN-3152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: Debian 7
Reporter: Neill Lima
Assignee: Naganarasimha G R

 I have two NNs in HA, they do not fail when the exclude file is not present 
 (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
 HA. I didn't create the exclude file at this point as well. I applied the HA 
 RM settings properly and when I started both RMs I started getting this 
 exception:
 2015-02-06 12:25:25,326 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
 OPERATION=transitionToActiveTARGET=RMHAProtocolService  
 RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
 PERMISSIONS=All users are allowed
 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
   ... 4 more
 Caused by: org.apache.hadoop.ha.ServiceFailedException: 
 java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
 or directory)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
   ... 5 more
 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
 Trying to re-establish ZK session
 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
 0x44af32566180094 closed
 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
 client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
 sessionTimeout=1 
 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
 using SASL (unknown error)
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to x.x.x.x/x.x.x.x:2181, initiating session
 The issue is descriptive enough to resolve the problem - and it has been 
 fixed by creating the exclude file. 
 I just think as of a improvement: 
 - Should RMs ignore the missing file as the NNs did?
 - Should single RM fail even when the file is not present?
 Just suggesting this improvement to keep the behavior consistent when working 
 with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3100) Make YARN authorization pluggable


 [ 
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3100:
--
Attachment: YARN-3100.2.patch

Uploaded a new patch:
- rename the DefaultYarnAuthorizer to ConfiguredYarnAuthorizer
- Added private/unstable annotations to the newly added classes.
- Move setPermissions on the authorizer after queue init/re-init is done.
Addressed other comments from Zhijie and Chris too.

 Make YARN authorization pluggable
 -

 Key: YARN-3100
 URL: https://issues.apache.org/jira/browse/YARN-3100
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3100.1.patch, YARN-3100.2.patch, YARN-3100.2.patch


 The goal is to have YARN acl model pluggable so as to integrate other 
 authorization tool such as Apache Ranger, Sentry.
 Currently, we have 
 - admin ACL
 - queue ACL
 - application ACL
 - time line domain ACL
 - service ACL
 The proposal is to create a YarnAuthorizationProvider interface. Current 
 implementation will be the default implementation. Ranger or Sentry plug-in 
 can implement  this interface.
 Benefit:
 -  Unify the code base. With the default implementation, we can get rid of 
 each specific ACL manager such as AdminAclManager, ApplicationACLsManager, 
 QueueAclsManager etc.
 - Enable Ranger, Sentry to do authorization for YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-02-06 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310286#comment-14310286
 ] 

Hitesh Shah commented on YARN-2928:
---

Also, what if an application does not want to write data to ATS or does not 
care if the data does not reach ATS?  Will there now be more flags introducing 
an application submission to tell the RM that it does or does not need the ATS 
service so as to ensure that its app does not fail?

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
 v1.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310285#comment-14310285
]

Sangjin Lee commented on YARN-2928:
---

bq. When you say user, what does it really imply? User a can submit a hive
query. A tez application running as user hive can execute the query submitted
by user a using a's delegation tokens. With proxy users and potential use of
delegation tokens, which user should be used?

That's something we haven't fully considered. IMO the user is used for resource
attribution (e.g. chargeback) and also for access control. We'll need to sort
out this scenario (probably not for the first cut however).

bq. What are the main differences between meta-data and configuration?

One could argue they are not different. However, from a user's perspective
(especially MR jobs), the configuration has a strong meaning. It might be good
to call out configuration separately from other metadata.

Application Timeline Server (ATS) next gen: phase 1
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time


[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310346#comment-14310346
 ] 

Hadoop QA commented on YARN-2348:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668984/YARN-2348.3.patch
  against trunk revision da2fb2b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6546//console

This message is automatically generated.

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA


[ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310487#comment-14310487
 ] 

Xuan Gong commented on YARN-3152:
-

In the yarn-site.xml, we do set some value for the 
yarn.resourcemanager.nodes.exclude-path. Since the file does not exist, we 
should throw out the exception. When RM starts to transit to active, it 
automatically calls all the refresh*s. It is by design if any of them fails, we 
should let RM fail.

 Missing hadoop exclude file fails RMs in HA
 ---

 Key: YARN-3152
 URL: https://issues.apache.org/jira/browse/YARN-3152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: Debian 7
Reporter: Neill Lima
Assignee: Naganarasimha G R

 NI have two NNs in HA, they do not fail when the exclude file is not present 
 (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
 HA. I didn't create the exclude file at this point as well. I applied the HA 
 RM settings properly and when I started both RMs I started getting this 
 exception:
 2015-02-06 12:25:25,326 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
 OPERATION=transitionToActiveTARGET=RMHAProtocolService  
 RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
 PERMISSIONS=All users are allowed
 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
   ... 4 more
 Caused by: org.apache.hadoop.ha.ServiceFailedException: 
 java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
 or directory)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
   ... 5 more
 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
 Trying to re-establish ZK session
 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
 0x44af32566180094 closed
 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
 client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
 sessionTimeout=1 
 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
 using SASL (unknown error)
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to x.x.x.x/x.x.x.x:2181, initiating session
 The issue is descriptive enough to resolve the problem - and it has been 
 fixed by creating the exclude file. 
 I just think as of a improvement: 
 - Should RMs ignore the missing file as the NNs did?
 - Should single RM fail even when the file is not present?
 Just suggesting this improvement to keep the behavior consistent when working 
 with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager


[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309662#comment-14309662
 ] 

Sangjin Lee commented on YARN-3087:
---

Thanks for looking into this [~devaraj.k]! Doesn't sound there is a quick 
resolution then. :(

 the REST server (web server) for per-node aggregator does not work if it runs 
 inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Devaraj K

 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal

2015-02-06 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3144:
--
Attachment: YARN-3144.4.patch

No problem, [~jlowe]. Uploaded patch to add the exception message.

 Configuration for making delegation token failures to timeline server 
 not-fatal
 ---

 Key: YARN-3144
 URL: https://issues.apache.org/jira/browse/YARN-3144
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, 
 YARN-3144.4.patch


 Posting events to the timeline server is best-effort. However, getting the 
 delegation tokens from the timeline server will kill the job. This patch adds 
 a configuration to make get delegation token operations best-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3100) Make YARN authorization pluggable

[
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309757#comment-14309757
]

Jian He commented on YARN-3100:
---

bq. AbstractCSQueue and CSQueueUtils
Maybe I missed something, I think these two are mostly fine. As we create the
new queue hierarchy first and then update the old queues. If certain methods
fail in these two classes, the new queue creation will fail upfront and so
will not update the old queue. Anyway, we can address this separately.

Make YARN authorization pluggable
-

Key: YARN-3100
URL: https://issues.apache.org/jira/browse/YARN-3100
Project: Hadoop YARN
Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-3100.1.patch, YARN-3100.2.patch

The goal is to have YARN acl model pluggable so as to integrate other
authorization tool such as Apache Ranger, Sentry.
Currently, we have
- admin ACL
- queue ACL
- application ACL
- time line domain ACL
- service ACL
The proposal is to create a YarnAuthorizationProvider interface. Current
implementation will be the default implementation. Ranger or Sentry plug-in
can implement this interface.
Benefit:
- Unify the code base. With the default implementation, we can get rid of
each specific ACL manager such as AdminAclManager, ApplicationACLsManager,
QueueAclsManager etc.
- Enable Ranger, Sentry to do authorization for YARN.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application

2015-02-06 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3151:


Assignee: Rohith

 On Failover tracking url wrong in application cli for KILLED application
 

 Key: YARN-3151
 URL: https://issues.apache.org/jira/browse/YARN-3151
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Affects Versions: 2.6.0
 Environment: 2 RM HA 
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Minor

 Run an application and kill the same after starting
 Check {color:red} ./yarn application -list -appStates KILLED {color}
 (empty line)
 {quote}
 Application-Id Tracking-URL
 application_1423219262738_0001  
 http://IP:PORT/cluster/app/application_1423219262738_0001
 {quote}
 Shutdown the active RM1
 Check the same command {color:red} ./yarn application -list -appStates KILLED 
 {color} after RM2 is active
 {quote}
 Application-Id Tracking-URL
 application_1423219262738_0001  null
 {quote}
 Tracking url for application is shown as null 
 Expected : Same url before failover should be shown
 ApplicationReport .getOriginalTrackingUrl() is null after failover
 org.apache.hadoop.yarn.client.cli.ApplicationCLI
 listApplications(SetString appTypes,
   EnumSetYarnApplicationState appStates)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed


[ 
https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308947#comment-14308947
 ] 

Hudson commented on YARN-1537:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/])
YARN-1537. Fix race condition in 
TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. 
(acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


 TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
 -

 Key: YARN-1537
 URL: https://issues.apache.org/jira/browse/YARN-1537
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1537.1.patch


 Here is the error log
 {code}
 Results :
 Failed tests: 
   TestLocalResourcesTrackerImpl.testLocalResourceCache:351 
 Wanted but not invoked:
 eventHandler.handle(
 
 isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent)
 );
 - at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351)
 However, there were other interactions with this mock:
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share


[ 
https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308949#comment-14308949
 ] 

Hudson commented on YARN-3101:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/])
YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max 
share (Anubhav Dhoot via Sandy Ryza) (sandy: rev 
b6466deac6d5d6344f693144290b46e2bef83a02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 In Fair Scheduler, fix canceling of reservations for exceeding max share
 

 Key: YARN-3101
 URL: https://issues.apache.org/jira/browse/YARN-3101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, 
 YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, 
 YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch


 YARN-2811 added fitInMaxShare to validate reservations on a queue, but did 
 not count it during its calculations. It also had the condition reversed so 
 the test was still passing because both cancelled each other. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService


[ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308953#comment-14308953
 ] 

Hudson commented on YARN-1904:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/])
YARN-1904. Ensure exceptions thrown in ClientRMService  
ApplicationHistoryClientService are uniform when application-attempt is not 
found. Contributed by Zhijie Shen. (acmurthy: rev 
18b2507edaac991e3ed68d2f27eb96f6882137b9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java


 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.7.0

 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application

2015-02-06 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-3151:
--

 Summary: On Failover tracking url wrong in application cli for 
KILLED application
 Key: YARN-3151
 URL: https://issues.apache.org/jira/browse/YARN-3151
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Affects Versions: 2.6.0
 Environment: 2 RM HA 
Reporter: Bibin A Chundatt
Priority: Minor


Run an application and kill the same after starting
Check {color:red} ./yarn application -list -appStates KILLED {color}

(empty line)

{quote}

Application-Id Tracking-URL
application_1423219262738_0001  
http://IP:PORT/cluster/app/application_1423219262738_0001

{quote}

Shutdown the active RM1
Check the same command {color:red} ./yarn application -list -appStates KILLED 
{color} after RM2 is active

{quote}

Application-Id Tracking-URL
application_1423219262738_0001  null

{quote}

Tracking url for application is shown as null 
Expected : Same url before failover should be shown

ApplicationReport .getOriginalTrackingUrl() is null after failover

org.apache.hadoop.yarn.client.cli.ApplicationCLI
listApplications(SetString appTypes,
  EnumSetYarnApplicationState appStates)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo


[ 
https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308952#comment-14308952
 ] 

Hudson commented on YARN-3145:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/])
YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler 
ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 
4641196fe02af5cab3d56a9f3c78875c495dbe03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* hadoop-yarn-project/CHANGES.txt


 ConcurrentModificationException on CapacityScheduler 
 ParentQueue#getQueueUserAclInfo
 

 Key: YARN-3145
 URL: https://issues.apache.org/jira/browse/YARN-3145
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-3145.001.patch, YARN-3145.002.patch


 {code}
 ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException
 at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo


[ 
https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309013#comment-14309013
 ] 

Hudson commented on YARN-3145:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/830/])
YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler 
ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 
4641196fe02af5cab3d56a9f3c78875c495dbe03)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 ConcurrentModificationException on CapacityScheduler 
 ParentQueue#getQueueUserAclInfo
 

 Key: YARN-3145
 URL: https://issues.apache.org/jira/browse/YARN-3145
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-3145.001.patch, YARN-3145.002.patch


 {code}
 ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException
 at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3149) Typo in message for invalid application id


[ 
https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309018#comment-14309018
 ] 

Hudson commented on YARN-3149:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/830/])
YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: 
rev b77ff37686e01b7497d3869fbc62789a5b123c0a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java


 Typo in message for invalid application id
 --

 Key: YARN-3149
 URL: https://issues.apache.org/jira/browse/YARN-3149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png


 Message in console wrong when application id format wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3100) Make YARN authorization pluggable

2015-02-06 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308987#comment-14308987
 ] 

Chris Douglas commented on YARN-3100:
-

Looking through {{AbstractCSQueue}} and {{CSQueueUtils}}, it looks like there 
are many misconfigurations that leave queues in an inconsistent state...

 Make YARN authorization pluggable
 -

 Key: YARN-3100
 URL: https://issues.apache.org/jira/browse/YARN-3100
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3100.1.patch, YARN-3100.2.patch


 The goal is to have YARN acl model pluggable so as to integrate other 
 authorization tool such as Apache Ranger, Sentry.
 Currently, we have 
 - admin ACL
 - queue ACL
 - application ACL
 - time line domain ACL
 - service ACL
 The proposal is to create a YarnAuthorizationProvider interface. Current 
 implementation will be the default implementation. Ranger or Sentry plug-in 
 can implement  this interface.
 Benefit:
 -  Unify the code base. With the default implementation, we can get rid of 
 each specific ACL manager such as AdminAclManager, ApplicationACLsManager, 
 QueueAclsManager etc.
 - Enable Ranger, Sentry to do authorization for YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService


[ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309014#comment-14309014
 ] 

Hudson commented on YARN-1904:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/830/])
YARN-1904. Ensure exceptions thrown in ClientRMService  
ApplicationHistoryClientService are uniform when application-attempt is not 
found. Contributed by Zhijie Shen. (acmurthy: rev 
18b2507edaac991e3ed68d2f27eb96f6882137b9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt


 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.7.0

 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue


[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309007#comment-14309007
 ] 

Hudson commented on YARN-1582:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/830/])
YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. 
Contributed by Thomas Graves (jlowe: rev 
69c8a7f45be5c0aa6787b07f328d74f1e2ba5628)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java


 Capacity Scheduler: add a maximum-allocation-mb setting per queue 
 --

 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.7.0

 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, 
 YARN-1582.003.patch


 We want to allow certain queues to use larger container sizes while limiting 
 other queues to smaller container sizes.  Setting it per queue will help 
 prevent abuse, help limit the impact of reservations, and allow changes in 
 the maximum container size to be rolled out more easily.
 One reason this is needed is more application types are becoming available on 
 yarn and certain applications require more memory to run efficiently. While 
 we want to allow for that we don't want other applications to abuse that and 
 start requesting bigger containers then what they really need.  
 Note that we could have this based on application type, but that might not be 
 totally accurate either since for example you might want to allow certain 
 users on MapReduce to use larger containers, while limiting other users of 
 MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue


[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309195#comment-14309195
 ] 

Hudson commented on YARN-1582:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/])
YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. 
Contributed by Thomas Graves (jlowe: rev 
69c8a7f45be5c0aa6787b07f328d74f1e2ba5628)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 Capacity Scheduler: add a maximum-allocation-mb setting per queue 
 --

 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.7.0

 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, 
 YARN-1582.003.patch


 We want to allow certain queues to use larger container sizes while limiting 
 other queues to smaller container sizes.  Setting it per queue will help 
 prevent abuse, help limit the impact of reservations, and allow changes in 
 the maximum container size to be rolled out more easily.
 One reason this is needed is more application types are becoming available on 
 yarn and certain applications require more memory to run efficiently. While 
 we want to allow for that we don't want other applications to abuse that and 
 start requesting bigger containers then what they really need.  
 Note that we could have this based on application type, but that might not be 
 totally accurate either since for example you might want to allow certain 
 users on MapReduce to use larger containers, while limiting other users of 
 MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3149) Typo in message for invalid application id


[ 
https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309206#comment-14309206
 ] 

Hudson commented on YARN-3149:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/])
YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: 
rev b77ff37686e01b7497d3869fbc62789a5b123c0a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java


 Typo in message for invalid application id
 --

 Key: YARN-3149
 URL: https://issues.apache.org/jira/browse/YARN-3149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png


 Message in console wrong when application id format wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3149) Typo in message for invalid application id


[ 
https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309180#comment-14309180
 ] 

Hudson commented on YARN-3149:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/])
YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: 
rev b77ff37686e01b7497d3869fbc62789a5b123c0a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* hadoop-yarn-project/CHANGES.txt


 Typo in message for invalid application id
 --

 Key: YARN-3149
 URL: https://issues.apache.org/jira/browse/YARN-3149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png


 Message in console wrong when application id format wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue


[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309170#comment-14309170
 ] 

Hudson commented on YARN-1582:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/])
YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. 
Contributed by Thomas Graves (jlowe: rev 
69c8a7f45be5c0aa6787b07f328d74f1e2ba5628)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 Capacity Scheduler: add a maximum-allocation-mb setting per queue 
 --

 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.7.0

 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, 
 YARN-1582.003.patch


 We want to allow certain queues to use larger container sizes while limiting 
 other queues to smaller container sizes.  Setting it per queue will help 
 prevent abuse, help limit the impact of reservations, and allow changes in 
 the maximum container size to be rolled out more easily.
 One reason this is needed is more application types are becoming available on 
 yarn and certain applications require more memory to run efficiently. While 
 we want to allow for that we don't want other applications to abuse that and 
 start requesting bigger containers then what they really need.  
 Note that we could have this based on application type, but that might not be 
 totally accurate either since for example you might want to allow certain 
 users on MapReduce to use larger containers, while limiting other users of 
 MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share


[ 
https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309173#comment-14309173
 ] 

Hudson commented on YARN-3101:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/])
YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max 
share (Anubhav Dhoot via Sandy Ryza) (sandy: rev 
b6466deac6d5d6344f693144290b46e2bef83a02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 In Fair Scheduler, fix canceling of reservations for exceeding max share
 

 Key: YARN-3101
 URL: https://issues.apache.org/jira/browse/YARN-3101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, 
 YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, 
 YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch


 YARN-2811 added fitInMaxShare to validate reservations on a queue, but did 
 not count it during its calculations. It also had the condition reversed so 
 the test was still passing because both cancelled each other. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService


[ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309177#comment-14309177
 ] 

Hudson commented on YARN-1904:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/])
YARN-1904. Ensure exceptions thrown in ClientRMService  
ApplicationHistoryClientService are uniform when application-attempt is not 
found. Contributed by Zhijie Shen. (acmurthy: rev 
18b2507edaac991e3ed68d2f27eb96f6882137b9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt


 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.7.0

 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed


[ 
https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309171#comment-14309171
 ] 

Hudson commented on YARN-1537:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/])
YARN-1537. Fix race condition in 
TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. 
(acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
 -

 Key: YARN-1537
 URL: https://issues.apache.org/jira/browse/YARN-1537
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1537.1.patch


 Here is the error log
 {code}
 Results :
 Failed tests: 
   TestLocalResourcesTrackerImpl.testLocalResourceCache:351 
 Wanted but not invoked:
 eventHandler.handle(
 
 isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent)
 );
 - at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351)
 However, there were other interactions with this mock:
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService


[ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309203#comment-14309203
 ] 

Hudson commented on YARN-1904:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/])
YARN-1904. Ensure exceptions thrown in ClientRMService  
ApplicationHistoryClientService are uniform when application-attempt is not 
found. Contributed by Zhijie Shen. (acmurthy: rev 
18b2507edaac991e3ed68d2f27eb96f6882137b9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* hadoop-yarn-project/CHANGES.txt


 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.7.0

 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed


[ 
https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309196#comment-14309196
 ] 

Hudson commented on YARN-1537:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/])
YARN-1537. Fix race condition in 
TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. 
(acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
 -

 Key: YARN-1537
 URL: https://issues.apache.org/jira/browse/YARN-1537
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1537.1.patch


 Here is the error log
 {code}
 Results :
 Failed tests: 
   TestLocalResourcesTrackerImpl.testLocalResourceCache:351 
 Wanted but not invoked:
 eventHandler.handle(
 
 isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent)
 );
 - at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351)
 However, there were other interactions with this mock:
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share


[ 
https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309199#comment-14309199
 ] 

Hudson commented on YARN-3101:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/])
YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max 
share (Anubhav Dhoot via Sandy Ryza) (sandy: rev 
b6466deac6d5d6344f693144290b46e2bef83a02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 In Fair Scheduler, fix canceling of reservations for exceeding max share
 

 Key: YARN-3101
 URL: https://issues.apache.org/jira/browse/YARN-3101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, 
 YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, 
 YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch


 YARN-2811 added fitInMaxShare to validate reservations on a queue, but did 
 not count it during its calculations. It also had the condition reversed so 
 the test was still passing because both cancelled each other. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal


[ 
https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309344#comment-14309344
 ] 

Jason Lowe commented on YARN-3144:
--

Thanks for updating the patch.  Comments:
* The added test now no longer mocks the TimelineClient as it did before?  The 
test requires the timeline client to throw to work properly, and we could 
accidentally connect to a timeline server.
* Nit: Does timelineServicesBestEffort need to be visible anymore?
* Nit: Reading the doc string for the property in yarn-default.xml implies it 
should be true to make timeline operations fatal.


 Configuration for making delegation token failures to timeline server 
 not-fatal
 ---

 Key: YARN-3144
 URL: https://issues.apache.org/jira/browse/YARN-3144
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3144.1.patch, YARN-3144.2.patch


 Posting events to the timeline server is best-effort. However, getting the 
 delegation tokens from the timeline server will kill the job. This patch adds 
 a configuration to make get delegation token operations best-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup


[ 
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309343#comment-14309343
 ] 

Hadoop QA commented on YARN-2809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697032/YARN-2809-v2.patch
  against trunk revision 1425e3d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6535//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6535//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6535//console

This message is automatically generated.

 Implement workaround for linux kernel panic when removing cgroup
 

 Key: YARN-2809
 URL: https://issues.apache.org/jira/browse/YARN-2809
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment:  RHEL 6.4
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2809-v2.patch, YARN-2809.patch


 Some older versions of linux have a bug that can cause a kernel panic when 
 the LCE attempts to remove a cgroup. It is a race condition so it's a bit 
 rare but on a few thousand node cluster it can result in a couple of panics 
 per day.
 This is the commit that likely (haven't verified) fixes the problem in linux: 
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
 Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup

2015-02-06 Thread Nathan Roberts (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-2809:
-
Attachment: YARN-2809-v2.patch

upmerge to latest trunk

 Implement workaround for linux kernel panic when removing cgroup
 

 Key: YARN-2809
 URL: https://issues.apache.org/jira/browse/YARN-2809
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment:  RHEL 6.4
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-2809-v2.patch, YARN-2809.patch


 Some older versions of linux have a bug that can cause a kernel panic when 
 the LCE attempts to remove a cgroup. It is a race condition so it's a bit 
 rare but on a few thousand node cluster it can result in a couple of panics 
 per day.
 This is the commit that likely (haven't verified) fixes the problem in linux: 
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
 Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3149) Typo in message for invalid application id


[ 
https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309262#comment-14309262
 ] 

Hudson commented on YARN-3149:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/])
YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: 
rev b77ff37686e01b7497d3869fbc62789a5b123c0a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java


 Typo in message for invalid application id
 --

 Key: YARN-3149
 URL: https://issues.apache.org/jira/browse/YARN-3149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png


 Message in console wrong when application id format wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo


[ 
https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309258#comment-14309258
 ] 

Hudson commented on YARN-3145:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/])
YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler 
ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 
4641196fe02af5cab3d56a9f3c78875c495dbe03)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 ConcurrentModificationException on CapacityScheduler 
 ParentQueue#getQueueUserAclInfo
 

 Key: YARN-3145
 URL: https://issues.apache.org/jira/browse/YARN-3145
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-3145.001.patch, YARN-3145.002.patch


 {code}
 ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException
 at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2015-02-06 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309390#comment-14309390
 ] 

Devaraj K commented on YARN-2246:
-

[~jlowe], [~zjshen] Thanks for your inputs. 

[~jlowe], I have started working on this, will provide patch today. Thanks


 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 3.0.0, 0.23.11, 2.5.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal


[ 
https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309454#comment-14309454
 ] 

Hadoop QA commented on YARN-3144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697051/YARN-3144.3.patch
  against trunk revision 1425e3d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6536//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6536//console

This message is automatically generated.

 Configuration for making delegation token failures to timeline server 
 not-fatal
 ---

 Key: YARN-3144
 URL: https://issues.apache.org/jira/browse/YARN-3144
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch


 Posting events to the timeline server is best-effort. However, getting the 
 delegation tokens from the timeline server will kill the job. This patch adds 
 a configuration to make get delegation token operations best-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share


[ 
https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309304#comment-14309304
 ] 

Hudson commented on YARN-3101:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/])
YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max 
share (Anubhav Dhoot via Sandy Ryza) (sandy: rev 
b6466deac6d5d6344f693144290b46e2bef83a02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 In Fair Scheduler, fix canceling of reservations for exceeding max share
 

 Key: YARN-3101
 URL: https://issues.apache.org/jira/browse/YARN-3101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, 
 YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, 
 YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch


 YARN-2811 added fitInMaxShare to validate reservations on a queue, but did 
 not count it during its calculations. It also had the condition reversed so 
 the test was still passing because both cancelled each other. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue


[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309301#comment-14309301
 ] 

Hudson commented on YARN-1582:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/])
YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. 
Contributed by Thomas Graves (jlowe: rev 
69c8a7f45be5c0aa6787b07f328d74f1e2ba5628)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


 Capacity Scheduler: add a maximum-allocation-mb setting per queue 
 --

 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.7.0

 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, 
 YARN-1582.003.patch


 We want to allow certain queues to use larger container sizes while limiting 
 other queues to smaller container sizes.  Setting it per queue will help 
 prevent abuse, help limit the impact of reservations, and allow changes in 
 the maximum container size to be rolled out more easily.
 One reason this is needed is more application types are becoming available on 
 yarn and certain applications require more memory to run efficiently. While 
 we want to allow for that we don't want other applications to abuse that and 
 start requesting bigger containers then what they really need.  
 Note that we could have this based on application type, but that might not be 
 totally accurate either since for example you might want to allow certain 
 users on MapReduce to use larger containers, while limiting other users of 
 MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3149) Typo in message for invalid application id


[ 
https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309311#comment-14309311
 ] 

Hudson commented on YARN-3149:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/])
YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: 
rev b77ff37686e01b7497d3869fbc62789a5b123c0a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java


 Typo in message for invalid application id
 --

 Key: YARN-3149
 URL: https://issues.apache.org/jira/browse/YARN-3149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Fix For: 2.7.0

 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png


 Message in console wrong when application id format wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo


[ 
https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309307#comment-14309307
 ] 

Hudson commented on YARN-3145:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/])
YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler 
ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 
4641196fe02af5cab3d56a9f3c78875c495dbe03)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 ConcurrentModificationException on CapacityScheduler 
 ParentQueue#getQueueUserAclInfo
 

 Key: YARN-3145
 URL: https://issues.apache.org/jira/browse/YARN-3145
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-3145.001.patch, YARN-3145.002.patch


 {code}
 ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException
 at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed


[ 
https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309302#comment-14309302
 ] 

Hudson commented on YARN-1537:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/])
YARN-1537. Fix race condition in 
TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. 
(acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt


 TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
 -

 Key: YARN-1537
 URL: https://issues.apache.org/jira/browse/YARN-1537
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Shen
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-1537.1.patch


 Here is the error log
 {code}
 Results :
 Failed tests: 
   TestLocalResourcesTrackerImpl.testLocalResourceCache:351 
 Wanted but not invoked:
 eventHandler.handle(
 
 isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent)
 );
 - at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351)
 However, there were other interactions with this mock:
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 - at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService


[ 
https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309259#comment-14309259
 ] 

Hudson commented on YARN-1904:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/])
YARN-1904. Ensure exceptions thrown in ClientRMService  
ApplicationHistoryClientService are uniform when application-attempt is not 
found. Contributed by Zhijie Shen. (acmurthy: rev 
18b2507edaac991e3ed68d2f27eb96f6882137b9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt


 Uniform the NotFound messages from ClientRMService and 
 ApplicationHistoryClientService
 --

 Key: YARN-1904
 URL: https://issues.apache.org/jira/browse/YARN-1904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.7.0

 Attachments: YARN-1904.1.patch


 It's good to make ClientRMService and ApplicationHistoryClientService throw 
 NotFoundException with similar messages



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share