[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519027#comment-14519027
 ] 

Steve Loughran commented on YARN-3539:
--

bq.  we need to update all the API classes to remark them stable.

Good point. My next patch will tag the relevant classes as @Evolving. 

> Compatibility doc to state that ATS v1 is a stable REST API
> ---
>
> Key: YARN-3539
> URL: https://issues.apache.org/jira/browse/YARN-3539
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
> YARN-3539-003.patch, YARN-3539-004.patch
>
>
> The ATS v2 discussion and YARN-2423 have raised the question: "how stable are 
> the ATSv1 APIs"?
> The existing compatibility document actually states that the History Server 
> is [a stable REST 
> API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
>  which effectively means that ATSv1 has already been declared as a stable API.
> Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: YARN-2893.004.patch

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518761#comment-14518761
 ] 

Sangjin Lee commented on YARN-3044:
---

[~zjshen], sorry I missed your comment earlier...

bq. Say we have a big cluster that can afford 5,000 concurrent containers...

I follow your logic there. But I meant 5,000 containers allocated *per second*, 
not 5,000 concurrent containers. In a large cluster, it is entirely possible 
that containers are allocated and released on the order of thousands per second 
easily. Then, it follows we're already talking about 2 * 5,000 events per 
second in such a situation. And if we add more event types it is reasonable to 
expect each of them to happen as fast as 5,000 events per second.

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps

2015-04-29 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518781#comment-14518781
 ] 

Xianyin Xin commented on YARN-2176:
---

Sorry [~jlowe], i've made a mistake. What i thought was Fair, where we resort 
all the apps when we make scheduling. When the number of the running apps is 
thousands, the time consume for resorting is hundreds of milliseconds. You're 
right that the overhead in CS is low.

> CapacityScheduler loops over all running applications rather than actively 
> requesting apps
> --
>
> Key: YARN-2176
> URL: https://issues.apache.org/jira/browse/YARN-2176
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>
> The capacity scheduler performance is primarily dominated by 
> LeafQueue.assignContainers, and that currently loops over all applications 
> that are running in the queue.  It would be more efficient if we looped over 
> just the applications that are actively asking for resources rather than all 
> applications, as there could be thousands of applications running but only a 
> few hundred that are currently asking for resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler

2015-04-29 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3558:
--

 Summary: Additional containers getting reserved from RM in case of 
Fair scheduler
 Key: YARN-3558
 URL: https://issues.apache.org/jira/browse/YARN-3558
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.7.0
 Environment: OS :Suse 11 Sp3
Setup : 2 RM 2 NM
Scheduler : Fair scheduler

Reporter: Bibin A Chundatt


Submit PI job with 16 maps
Total container expected : 16 MAPS + 1 Reduce  + 1 AM
Total containers reserved by RM is 21

Below set of containers are not being used for execution

container_1430213948957_0001_01_20
container_1430213948957_0001_01_19


RM Containers reservation and states
{code}
 Processing container_1430213948957_0001_01_01 of type START
 Processing container_1430213948957_0001_01_01 of type ACQUIRED
 Processing container_1430213948957_0001_01_01 of type LAUNCHED
 Processing container_1430213948957_0001_01_02 of type START
 Processing container_1430213948957_0001_01_03 of type START
 Processing container_1430213948957_0001_01_02 of type ACQUIRED
 Processing container_1430213948957_0001_01_03 of type ACQUIRED
 Processing container_1430213948957_0001_01_04 of type START
 Processing container_1430213948957_0001_01_05 of type START
 Processing container_1430213948957_0001_01_04 of type ACQUIRED
 Processing container_1430213948957_0001_01_05 of type ACQUIRED
 Processing container_1430213948957_0001_01_02 of type LAUNCHED
 Processing container_1430213948957_0001_01_04 of type LAUNCHED
 Processing container_1430213948957_0001_01_06 of type RESERVED
 Processing container_1430213948957_0001_01_03 of type LAUNCHED
 Processing container_1430213948957_0001_01_05 of type LAUNCHED
 Processing container_1430213948957_0001_01_07 of type START
 Processing container_1430213948957_0001_01_07 of type ACQUIRED
 Processing container_1430213948957_0001_01_07 of type LAUNCHED
 Processing container_1430213948957_0001_01_08 of type RESERVED
 Processing container_1430213948957_0001_01_02 of type FINISHED
 Processing container_1430213948957_0001_01_06 of type START
 Processing container_1430213948957_0001_01_06 of type ACQUIRED
 Processing container_1430213948957_0001_01_06 of type LAUNCHED
 Processing container_1430213948957_0001_01_04 of type FINISHED
 Processing container_1430213948957_0001_01_09 of type START
 Processing container_1430213948957_0001_01_09 of type ACQUIRED
 Processing container_1430213948957_0001_01_09 of type LAUNCHED
 Processing container_1430213948957_0001_01_10 of type RESERVED
 Processing container_1430213948957_0001_01_03 of type FINISHED
 Processing container_1430213948957_0001_01_08 of type START
 Processing container_1430213948957_0001_01_08 of type ACQUIRED
 Processing container_1430213948957_0001_01_08 of type LAUNCHED
 Processing container_1430213948957_0001_01_05 of type FINISHED
 Processing container_1430213948957_0001_01_11 of type START
 Processing container_1430213948957_0001_01_11 of type ACQUIRED
 Processing container_1430213948957_0001_01_11 of type LAUNCHED
 Processing container_1430213948957_0001_01_07 of type FINISHED
 Processing container_1430213948957_0001_01_12 of type START
 Processing container_1430213948957_0001_01_12 of type ACQUIRED
 Processing container_1430213948957_0001_01_12 of type LAUNCHED
 Processing container_1430213948957_0001_01_13 of type RESERVED
 Processing container_1430213948957_0001_01_06 of type FINISHED
 Processing container_1430213948957_0001_01_10 of type START
 Processing container_1430213948957_0001_01_10 of type ACQUIRED
 Processing container_1430213948957_0001_01_10 of type LAUNCHED
 Processing container_1430213948957_0001_01_09 of type FINISHED
 Processing container_1430213948957_0001_01_14 of type START
 Processing container_1430213948957_0001_01_14 of type ACQUIRED
 Processing container_1430213948957_0001_01_14 of type LAUNCHED
 Processing container_1430213948957_0001_01_15 of type RESERVED
 Processing container_1430213948957_0001_01_08 of type FINISHED
 Processing container_1430213948957_0001_01_13 of type START
 Processing container_1430213948957_0001_01_16 of type RESERVED
 Processing container_1430213948957_0001_01_13 of type ACQUIRED
 Processing container_1430213948957_0001_01_13 of type LAUNCHED
 Processing container_1430213948957_0001_01_11 of type FINISHED
 Processing container_1430213948957_0001_01_16 of type START
 Processing container_1430213948957_0001_01_10 of type FINISHED
 Processing container_1430213948957_0001_01_15 of type START
 Processing container_1430213948957_0001_01_16 of type ACQUIRED
 Processing container_1430213948957_0001_01_15 of typ

[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: (was: YARN-2893.004.patch)

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch, YARN-2893.003.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518917#comment-14518917
 ] 

zhihai xu commented on YARN-2893:
-

The TestAMRestart failure is not related to my change. YARN-2483 is for this 
test failure.

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518760#comment-14518760
 ] 

Sangjin Lee commented on YARN-3044:
---

It looks like some of the issues reported by the jenkins build might be related 
to the patch? It would be great if you could look into them and see if we can 
resolve them.

Some additional comments:

(RMContainerEntity.java)
- l.28-29: NM -> RM

(TimelineServiceV2Publisher.java)
- l.141: I would prefer explicit entity.setQueue() over setting the info 
directly. Although it is currently equivalent, we should stick with the high 
level methods we introduced and that would be robust even if we should change 
how the queue is set.
- l.147: how about using a simple for loop?
- l.179: curious, we could add them to the entity as metrics, right?
- l.300: unnecessary line?


> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3552) RMServerUtils#DUMMY_APPLICATION_RESOURCE_USAGE_REPORT has negative numbers

2015-04-29 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519096#comment-14519096
 ] 

Rohith commented on YARN-3552:
--

I think for UI dispaly 'N/A' is resonable, and REST we should keep existing 
behavior since it affect compatibility.

> RMServerUtils#DUMMY_APPLICATION_RESOURCE_USAGE_REPORT  has negative numbers
> ---
>
> Key: YARN-3552
> URL: https://issues.apache.org/jira/browse/YARN-3552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Rohith
>Assignee: Rohith
>Priority: Trivial
> Attachments: 0001-YARN-3552.patch, yarn-3352.PNG
>
>
> In the RMServerUtils, the default values are negative number which results in 
> the displayiing the RM web UI also negative number. 
> {code}
>   public static final ApplicationResourceUsageReport
> DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
>   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
>   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
>   Resources.createResource(-1, -1), 0, 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.1.patch

Attaching the patch
kindly review

> FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
> to TestAppRunnability
> ---
>
> Key: YARN-3271
> URL: https://issues.apache.org/jira/browse/YARN-3271
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: nijel
> Attachments: YARN-3271.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public

2015-04-29 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-3559:


 Summary: Mark org.apache.hadoop.security.token.Token as 
@InterfaceAudience.Public
 Key: YARN-3559
 URL: https://issues.apache.org/jira/browse/YARN-3559
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Steve Loughran


{{org.apache.hadoop.security.token.Token}} is tagged 
{{@InterfaceAudience.LimitedPrivate}} for "HDFS" and "MapReduce".

However, it is used throughout YARN apps, where both the clients and the AM 
need to work with tokens. This class and related all need to be declared 
public. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public

2015-04-29 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519137#comment-14519137
 ] 

J.Andreina commented on YARN-3559:
--

[~ste...@apache.org] ,I would like to work on this issue. If you have not 
already started working on this , shall i take this issue?

> Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public
> 
>
> Key: YARN-3559
> URL: https://issues.apache.org/jira/browse/YARN-3559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>
> {{org.apache.hadoop.security.token.Token}} is tagged 
> {{@InterfaceAudience.LimitedPrivate}} for "HDFS" and "MapReduce".
> However, it is used throughout YARN apps, where both the clients and the AM 
> need to work with tokens. This class and related all need to be declared 
> public. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-04-29 Thread Peng Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peng Zhang updated YARN-3535:
-
Attachment: YARN-3535-002.patch

# Remove call of recoverResourceRequestForContainer from preemption to avoid 
duplication of recover RR.
# Fix broken tests.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
> Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519196#comment-14519196
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


> FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
> FairShare policies
> 
>
> Key: YARN-3485
> URL: https://issues.apache.org/jira/browse/YARN-3485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
> yarn-3485-prelim.patch
>
>
> FairScheduler's headroom calculations consider the fairshare and 
> cluster-available-resources, and the fairshare has maxResources. However, for 
> Fifo and Fairshare policies, the fairshare is used only for memory and not 
> cpu. So, the scheduler ends up showing a higher headroom than is actually 
> available. This could lead to applications waiting for resources far longer 
> than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519222#comment-14519222
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


> FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
> FairShare policies
> 
>
> Key: YARN-3485
> URL: https://issues.apache.org/jira/browse/YARN-3485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
> yarn-3485-prelim.patch
>
>
> FairScheduler's headroom calculations consider the fairshare and 
> cluster-available-resources, and the fairshare has maxResources. However, for 
> Fifo and Fairshare policies, the fairshare is used only for memory and not 
> cpu. So, the scheduler ends up showing a higher headroom than is actually 
> available. This could lead to applications waiting for resources far longer 
> than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519247#comment-14519247
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


> FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
> FairShare policies
> 
>
> Key: YARN-3485
> URL: https://issues.apache.org/jira/browse/YARN-3485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
> yarn-3485-prelim.patch
>
>
> FairScheduler's headroom calculations consider the fairshare and 
> cluster-available-resources, and the fairshare has maxResources. However, for 
> Fifo and Fairshare policies, the fairshare is used only for memory and not 
> cpu. So, the scheduler ends up showing a higher headroom than is actually 
> available. This could lead to applications waiting for resources far longer 
> than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3560) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job

2015-04-29 Thread Anushri (JIRA)
Anushri created YARN-3560:
-

 Summary: Not able to navigate to the cluster from tracking url 
(proxy) generated after submission of job
 Key: YARN-3560
 URL: https://issues.apache.org/jira/browse/YARN-3560
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anushri
Priority: Minor


a standalone web proxy server is enabled in the cluster
when a job is submitted the url generated contains proxy
track this url
in the web page , if we try to navigate to the cluster links [about. 
applications, or scheduler] it gets redirected to some default port instead of 
actual RM web port configured
as such it throws "webpage not available"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519278#comment-14519278
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/912/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


> FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
> FairShare policies
> 
>
> Key: YARN-3485
> URL: https://issues.apache.org/jira/browse/YARN-3485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
> yarn-3485-prelim.patch
>
>
> FairScheduler's headroom calculations consider the fairshare and 
> cluster-available-resources, and the fairshare has maxResources. However, for 
> Fifo and Fairshare policies, the fairshare is used only for memory and not 
> cpu. So, the scheduler ends up showing a higher headroom than is actually 
> available. This could lead to applications waiting for resources far longer 
> than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3517:

Attachment: YARN-3517.006.patch

{quote}
in RMWebServices.java we don't need the isSecurityEnabled check. Just remove 
the entire check. My reasoning is that logLevel app does not do those checks, 
it simply makes sure you are an admin.

+ if (UserGroupInformation.isSecurityEnabled() && callerUGI == null)
\{ + String msg = "Unable to obtain user name, user not authenticated"; + throw 
new AuthorizationException(msg); + }
{quote}

Removed the check.

{quote}
in the test TestRMWebServices.java. We aren't actually asserting anything. we 
should assert that the expected files exist. Personally I would also like to 
see an assert that the expected exception occurred.
{quote}

Added explicit check for the exception being thrown as well as a check for the 
log files existing.

> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Thomas Graves
>Priority: Blocker
>  Labels: security
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519326#comment-14519326
 ] 

Hadoop QA commented on YARN-3271:
-

(!) The patch artifact directory on has been removed! 
This is a fatal error for test-patch.sh.  Aborting. 
Jenkins (node H4) information at 
https://builds.apache.org/job/PreCommit-YARN-Build/7537/ may provide some hints.

> FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
> to TestAppRunnability
> ---
>
> Key: YARN-3271
> URL: https://issues.apache.org/jira/browse/YARN-3271
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: nijel
> Attachments: YARN-3271.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation

2015-04-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519342#comment-14519342
 ] 

Varun Vasudev commented on YARN-2619:
-

The sharing is just an equal split of disk operations. The number 500 is just 
arbitrary - all that matters is that all containers get assigned the same 
weight which ensures they get an equal share of disk operations. Once we have 
scheduling support, the weight will be determined dynamically based on the 
allocated resources.

> NodeManager: Add cgroups support for disk I/O isolation
> ---
>
> Key: YARN-2619
> URL: https://issues.apache.org/jira/browse/YARN-2619
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2619-1.patch, YARN-2619.002.patch, 
> YARN-2619.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation

2015-04-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2619:

Attachment: YARN-2619.004.patch

bq. there is an unused import 
org.apache.hadoop.yarn.server.nodemanager.util.TestCgroupsLCEResourcesHandler 
in TestCGroupsHandlerImpl 

Fixed.

bq. And the defaults (what does a weight of 500 mean?)

Added a comment in the implementation that it's just arbitrary.

bq. Should we deprecate LCEResourcesHandler hierarchy so that future work 
doesn't go there?

We should do it once YARN-3542 gets committed.

bq. Add Override annotations for methods that are overridden so it is clear 
what behavior is dictated by the base interface

Fixed.

bq. Print a warning if PARTITIONS_FILE cannot be read?

Fixed.

I've also re-factored part of CGroupsHandlerImpl to make testing 
cleaner(removing the need to read the controllerPaths map).

[~sidharta-s] - can you please review to make sure the refactoring is ok?

> NodeManager: Add cgroups support for disk I/O isolation
> ---
>
> Key: YARN-2619
> URL: https://issues.apache.org/jira/browse/YARN-2619
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2619-1.patch, YARN-2619.002.patch, 
> YARN-2619.003.patch, YARN-2619.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519360#comment-14519360
 ] 

Jason Lowe commented on YARN-3554:
--

I think 10 minutes is still too high.  We didn't even have this functionality 
until 2.6 because of rolling upgrades, and NMs don't take that long to recover 
in a rolling upgrade.  They recover in tens of seconds rather than tens of 
minutes.  Therefore I don't think it makes much sense to spend a lot of time 
trying to connect to an NM beyond a few minutes.  The chances of successfully 
connecting after a few minutes of trying is going to be very low, and NMs fail 
all the time anyway.  So if we spend all that extra time trying for essentially 
no benefit, all we've done is prolonged the application recovery time for no 
good reason.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
> Attachments: YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519361#comment-14519361
 ] 

Hadoop QA commented on YARN-3535:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 22s | The applied patch generated  7 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 18s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  53m 45s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 32s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729146/YARN-3535-002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7538/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7538/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7538/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7538/console |


This message was automatically generated.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
> Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519367#comment-14519367
 ] 

Hadoop QA commented on YARN-3445:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 15  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 20s | The applied patch generated  4 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 59s | The patch appears to introduce 
14 new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |  52m 26s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m  0s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-sls |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String):in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String): new java.io.FileReader(String)  At RumenToSLSConverter.java:[line 122] 
|
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String):in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String): new java.io.FileWriter(String)  At RumenToSLSConverter.java:[line 124] 
|
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String):in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String): new 
java.io.FileWriter(String)  At RumenToSLSConverter.java:[line 145] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int):in 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int): new 
java.io.FileReader(String)  At SLSRunner.java:[line 280] |
|  |  Unwritten field:NodeInfo.java:[line 140] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics():in 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(): 
new java.io.FileWriter(String)  At ResourceSchedulerWrapper.java:[line 490] |
|  |  Found reliance on default encoding in new 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper):in
 new 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper):
 new java.io.FileWriter(String)  At ResourceSchedulerWrapper.java:[line 695] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics():in 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new 
java.io.FileWriter(String)  At SLSCapacityScheduler.java:[line 493] |
|  |  Found reliance on default encoding in new 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):in
 new 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):
 new java.io.FileWriter(String)  At SLSCapacityScheduler.java:[line 698] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String):in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String): new 
java.io.FileReader(String)  At SLSUtils.java:[line 119] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String):in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String): new 
java.io.FileReader(String)  At SLSUtils.java:[line 92] |
|  |  Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient 
non-serializable instance field handleOperTimecostHistogramMap  In 
SLSWebApp.java:instance field handleOperTimecostHistogramMap  In SLSWebApp.java 
|
|  |  Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient 
non-serializable instanc

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-04-29 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519368#comment-14519368
 ] 

Peng Zhang commented on YARN-3535:
--

I think TestAMRestart failure is not related with this patch. 
I found YARN-2483 is to resolve it.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
> Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519415#comment-14519415
 ] 

Hadoop QA commented on YARN-2619:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   8m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   4m 57s | The applied patch generated  5 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 23s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 11s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  55m 21s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729166/YARN-2619.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7541/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7541/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7541/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7541/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7541/console |


This message was automatically generated.

> NodeManager: Add cgroups support for disk I/O isolation
> ---
>
> Key: YARN-2619
> URL: https://issues.apache.org/jira/browse/YARN-2619
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2619-1.patch, YARN-2619.002.patch, 
> YARN-2619.003.patch, YARN-2619.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519417#comment-14519417
 ] 

Hadoop QA commented on YARN-3517:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 38s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 54s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  53m  7s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m  0s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729158/YARN-3517.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7540/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7540/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7540/console |


This message was automatically generated.

> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Thomas Graves
>Priority: Blocker
>  Labels: security
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519423#comment-14519423
 ] 

Varun Vasudev commented on YARN-3517:
-

The test failure is unrelated to the patch.

> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Thomas Graves
>Priority: Blocker
>  Labels: security
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519471#comment-14519471
 ] 

Junping Du commented on YARN-3044:
--

bq. l.179: curious, we could add them to the entity as metrics, right?
[~sjlee0], I think you are talking about SystemMetricsEvent here. Isn't it? It 
could be confusing to use event or metrics for SystemMetricsEvent. But if we 
think the semantic of "metrics" is some data (numeric) we can aggregation 
across a time range, this sounds more fall into a category of event. Thoughts?



> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519526#comment-14519526
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #179 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/179/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java


> FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
> FairShare policies
> 
>
> Key: YARN-3485
> URL: https://issues.apache.org/jira/browse/YARN-3485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
> yarn-3485-prelim.patch
>
>
> FairScheduler's headroom calculations consider the fairshare and 
> cluster-available-resources, and the fairshare has maxResources. However, for 
> Fifo and Fairshare policies, the fairshare is used only for memory and not 
> cpu. So, the scheduler ends up showing a higher headroom than is actually 
> available. This could lead to applications waiting for resources far longer 
> than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)
Gour Saha created YARN-3561:
---

 Summary: Non-AM Containers continue to run even after AM is stopped
 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
Reporter: Gour Saha
Priority: Critical


Non-AM containers continue to run even after application is stopped. This 
occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
Hadoop 2.6 deployment. 

Following are the NM logs from 2 different nodes:
*host-07* - where Slider AM was running
*host-03* - where Storm NIMBUS container was running.

*Note:* The logs are partial, starting with the time when the relevant Slider 
AM and NIMBUS containers were allocated, till the time when the Slider AM was 
stopped. Also, the large number of "Memory usage" log lines were removed 
keeping only a few starts and ends of every segment.

*NM log from host-07 where Slider AM container was running:*
{noformat}
2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
container_1428575950531_0020_02_01
2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - Auth 
successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
container_1428575950531_0021_01_01 by user yarn
2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
application reference for app application_1428575950531_0021
2015-04-29 00:41:10,323 INFO  application.Application 
(ApplicationImpl.java:handle(464)) - Application application_1428575950531_0021 
transitioned from NEW to INITING
2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
(NMAuditLogger.java:logSuccess(89)) - USER=yarn IP=10.84.105.162
OPERATION=Start Container Request   TARGET=ContainerManageImpl  
RESULT=SUCCESS  APPID=application_1428575950531_0021
CONTAINERID=container_1428575950531_0021_01_01
2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
(LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root Log 
Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
[rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
users.
2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:(182)) - rollingMonitorInterval is set as -1. 
The log rolling mornitoring interval is disabled. The logs will be aggregated 
after this application is finished.
2015-04-29 00:41:10,351 INFO  application.Application 
(ApplicationImpl.java:transition(304)) - Adding 
container_1428575950531_0021_01_01 to application 
application_1428575950531_0021
2015-04-29 00:41:10,352 INFO  application.Application 
(ApplicationImpl.java:handle(464)) - Application application_1428575950531_0021 
transitioned from INITING to RUNNING
2015-04-29 00:41:10,356 INFO  container.Container 
(ContainerImpl.java:handle(999)) - Container 
container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
(AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
application_1428575950531_0021
2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
 transitioned from INIT to DOWNLOADING
2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
 transitioned from INIT to DOWNLOADING
2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
 transitioned from INIT to DOWNLOADING
2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
transitioned from INIT to DOWNLOADING
2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/httpcore-4.2.5.jar
 transitioned from INIT to DOWNLOADING
2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428

[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519568#comment-14519568
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


> FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
> FairShare policies
> 
>
> Key: YARN-3485
> URL: https://issues.apache.org/jira/browse/YARN-3485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
> yarn-3485-prelim.patch
>
>
> FairScheduler's headroom calculations consider the fairshare and 
> cluster-available-resources, and the fairshare has maxResources. However, for 
> Fifo and Fairshare policies, the fairshare is used only for memory and not 
> cpu. So, the scheduler ends up showing a higher headroom than is actually 
> available. This could lead to applications waiting for resources far longer 
> than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519576#comment-14519576
 ] 

Naganarasimha G R commented on YARN-3554:
-

Agree with you [~jlowe], but what do you feel the ideal timeout should be, 3 
mins /  5 mins ? May be as you guys would have better experience with large 
number of nodes and see frequent NM failures you can suggest a better value 
here .


> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
> Attachments: YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519603#comment-14519603
 ] 

Jason Lowe commented on YARN-3554:
--

I suggest we go with 3 minutes.  The retry interval is 10 seconds, so we'll get 
plenty of retries in that time if the failure is fast (e.g.: unknown host, 
connection refused) and still get a few retries in if the failure is slow 
(e.g.: connection timeout).

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
> Attachments: YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3484) Fix up yarn top shell code

2015-04-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3484:

Attachment: YARN-3484.002.patch

bq. variables that are local to a function should be declared local.
Fixed.

bq. avoid using mixed case as per the shell programming guidelines
Fixed.

bq. yarnTopArgs is effectively a global. It should either get renamed to 
YARN_foo or another to not pollute the shell name space or another approach is 
process set_yarn_top_args as a subshell, reading its input directly to avoid 
the global entirely
Fixed; renamed it to YARN_TOP_ARGS.

bq. set_yarn_top_args should be hadoop_ something so as to not pollute the 
shell name space
Fixed; changed the name to hadoop_set_yarn_top_args

> Fix up yarn top shell code
> --
>
> Key: YARN-3484
> URL: https://issues.apache.org/jira/browse/YARN-3484
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Varun Vasudev
> Attachments: YARN-3484.001.patch, YARN-3484.002.patch
>
>
> We need to do some work on yarn top's shell code.
> a) Just checking for TERM isn't good enough.  We really need to check the 
> return on tput, especially since the output will not be a number but an error 
> string which will likely blow up the java code in horrible ways.
> b) All the single bracket tests should be double brackets to force the bash 
> built-in.
> c) I'd think I'd rather see the shell portion in a function since it's rather 
> large.  This will allow for args, etc, to get local'ized and clean up the 
> case statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3484) Fix up yarn top shell code

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519636#comment-14519636
 ] 

Hadoop QA commented on YARN-3484:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   0m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | release audit |   0m 15s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:blue}0{color} | shellcheck |   0m 15s | Shellcheck was not available. |
| | |   0m 23s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729211/YARN-3484.002.patch |
| Optional Tests | shellcheck |
| git revision | trunk / 8f82970 |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7542/console |


This message was automatically generated.

> Fix up yarn top shell code
> --
>
> Key: YARN-3484
> URL: https://issues.apache.org/jira/browse/YARN-3484
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Varun Vasudev
> Attachments: YARN-3484.001.patch, YARN-3484.002.patch
>
>
> We need to do some work on yarn top's shell code.
> a) Just checking for TERM isn't good enough.  We really need to check the 
> return on tput, especially since the output will not be a number but an error 
> string which will likely blow up the java code in horrible ways.
> b) All the single bracket tests should be double brackets to force the bash 
> built-in.
> c) I'd think I'd rather see the shell portion in a function since it's rather 
> large.  This will allow for args, etc, to get local'ized and clean up the 
> case statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3554:

Attachment: YARN-3554-20150429-2.patch

Hi [~jlowe] Updating with 3 minutes as the timeout

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>     Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519792#comment-14519792
 ] 

Sangjin Lee commented on YARN-3044:
---

Yes that's kind of what I'm wondering about. So having them as events means 
that they should/will not be aggregated (e.g. from app => flow). Is that the 
intent with these values (CPU and cores)? I'm not exactly clear what these 
values indicate.

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation

2015-04-29 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519814#comment-14519814
 ] 

Sidharta Seethana commented on YARN-2619:
-

[~vvasudev] , thanks for refactoring the test to be cleaner. The corresponding 
changes seem good to me. 

> NodeManager: Add cgroups support for disk I/O isolation
> ---
>
> Key: YARN-2619
> URL: https://issues.apache.org/jira/browse/YARN-2619
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2619-1.patch, YARN-2619.002.patch, 
> YARN-2619.003.patch, YARN-2619.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519840#comment-14519840
 ] 

Hadoop QA commented on YARN-3554:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   7m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 46s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729227/YARN-3554-20150429-2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/console |


This message was automatically generated.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519848#comment-14519848
 ] 

Vinod Kumar Vavilapalli commented on YARN-3561:
---

Is this because the keep-containers flag is on? Why was the AM stopped and not 
the the app killed if that is what they want.

> Non-AM Containers continue to run even after AM is stopped
> --
>
> Key: YARN-3561
> URL: https://issues.apache.org/jira/browse/YARN-3561
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.0
>Reporter: Gour Saha
>Priority: Critical
>
> Non-AM containers continue to run even after application is stopped. This 
> occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
> Hadoop 2.6 deployment. 
> Following are the NM logs from 2 different nodes:
> *host-07* - where Slider AM was running
> *host-03* - where Storm NIMBUS container was running.
> *Note:* The logs are partial, starting with the time when the relevant Slider 
> AM and NIMBUS containers were allocated, till the time when the Slider AM was 
> stopped. Also, the large number of "Memory usage" log lines were removed 
> keeping only a few starts and ends of every segment.
> *NM log from host-07 where Slider AM container was running:*
> {noformat}
> 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
> container_1428575950531_0020_02_01
> 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
> Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
> container_1428575950531_0021_01_01 by user yarn
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
> application reference for app application_1428575950531_0021
> 2015-04-29 00:41:10,323 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from NEW to INITING
> 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
> OPERATION=Start Container Request   TARGET=ContainerManageImpl  
> RESULT=SUCCESS  APPID=application_1428575950531_0021
> CONTAINERID=container_1428575950531_0021_01_01
> 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
> (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
> Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
> [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
> users.
> 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:(182)) - rollingMonitorInterval is set as 
> -1. The log rolling mornitoring interval is disabled. The logs will be 
> aggregated after this application is finished.
> 2015-04-29 00:41:10,351 INFO  application.Application 
> (ApplicationImpl.java:transition(304)) - Adding 
> container_1428575950531_0021_01_01 to application 
> application_1428575950531_0021
> 2015-04-29 00:41:10,352 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from INITING to RUNNING
> 2015-04-29 00:41:10,356 INFO  container.Container 
> (ContainerImpl.java:handle(999)) - Container 
> container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
> 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
> (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
> application_1428575950531_0021
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/y

[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-3561:

Fix Version/s: 2.6.1

> Non-AM Containers continue to run even after AM is stopped
> --
>
> Key: YARN-3561
> URL: https://issues.apache.org/jira/browse/YARN-3561
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.0
>Reporter: Gour Saha
>Priority: Critical
> Fix For: 2.6.1
>
>
> Non-AM containers continue to run even after application is stopped. This 
> occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
> Hadoop 2.6 deployment. 
> Following are the NM logs from 2 different nodes:
> *host-07* - where Slider AM was running
> *host-03* - where Storm NIMBUS container was running.
> *Note:* The logs are partial, starting with the time when the relevant Slider 
> AM and NIMBUS containers were allocated, till the time when the Slider AM was 
> stopped. Also, the large number of "Memory usage" log lines were removed 
> keeping only a few starts and ends of every segment.
> *NM log from host-07 where Slider AM container was running:*
> {noformat}
> 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
> container_1428575950531_0020_02_01
> 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
> Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
> container_1428575950531_0021_01_01 by user yarn
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
> application reference for app application_1428575950531_0021
> 2015-04-29 00:41:10,323 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from NEW to INITING
> 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
> OPERATION=Start Container Request   TARGET=ContainerManageImpl  
> RESULT=SUCCESS  APPID=application_1428575950531_0021
> CONTAINERID=container_1428575950531_0021_01_01
> 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
> (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
> Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
> [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
> users.
> 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:(182)) - rollingMonitorInterval is set as 
> -1. The log rolling mornitoring interval is disabled. The logs will be 
> aggregated after this application is finished.
> 2015-04-29 00:41:10,351 INFO  application.Application 
> (ApplicationImpl.java:transition(304)) - Adding 
> container_1428575950531_0021_01_01 to application 
> application_1428575950531_0021
> 2015-04-29 00:41:10,352 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from INITING to RUNNING
> 2015-04-29 00:41:10,356 INFO  container.Container 
> (ContainerImpl.java:handle(999)) - Container 
> container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
> 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
> (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
> application_1428575950531_0021
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
> transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResou

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519858#comment-14519858
 ] 

stack commented on YARN-3411:
-

bq. So there are some major changes between hbase 0.98 and hbase like the 
client facing APIs (HTableInterface, etc) have been deprecated and replaced 
with new interfaces.

It would be a pity if you fellas were stuck on the 0.98 APIs. Phoenix is 
shaping up to do an RC that will work w/ hbase 1.x.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519861#comment-14519861
 ] 

Karthik Kambatla commented on YARN-3271:


Thanks for working on this, [~nijel]. 

While at this, can we improve how we initialize the scheduler in 
{{TestAppRunnability#setUp}} as below? 
{code}
Configuration conf = createConfiguration();
resourceManager = new MockRM(conf);
resourceManager.start();
scheduler = (FairScheduler) resourceManager.getResourceScheduler();
{code}

> FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
> to TestAppRunnability
> ---
>
> Key: YARN-3271
> URL: https://issues.apache.org/jira/browse/YARN-3271
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: nijel
> Attachments: YARN-3271.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-29 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3521:
--
Attachment: 0001-YARN-3521.patch

Attaching an initial version. [~leftnoteasy] pls check the same as I have the 
changed the method interface of *getClusterNodeLabels* and 
*addToClusterNodeLabels* to pass argument to *List*.


> Support return structured NodeLabel objects in REST API when call 
> getClusterNodeLabels
> --
>
> Key: YARN-3521
> URL: https://issues.apache.org/jira/browse/YARN-3521
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-3521.patch
>
>
> In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
> make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-3561:

Environment: debian 7

> Non-AM Containers continue to run even after AM is stopped
> --
>
> Key: YARN-3561
> URL: https://issues.apache.org/jira/browse/YARN-3561
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.0
> Environment: debian 7
>Reporter: Gour Saha
>Priority: Critical
> Fix For: 2.6.1
>
>
> Non-AM containers continue to run even after application is stopped. This 
> occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
> Hadoop 2.6 deployment. 
> Following are the NM logs from 2 different nodes:
> *host-07* - where Slider AM was running
> *host-03* - where Storm NIMBUS container was running.
> *Note:* The logs are partial, starting with the time when the relevant Slider 
> AM and NIMBUS containers were allocated, till the time when the Slider AM was 
> stopped. Also, the large number of "Memory usage" log lines were removed 
> keeping only a few starts and ends of every segment.
> *NM log from host-07 where Slider AM container was running:*
> {noformat}
> 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
> container_1428575950531_0020_02_01
> 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
> Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
> container_1428575950531_0021_01_01 by user yarn
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
> application reference for app application_1428575950531_0021
> 2015-04-29 00:41:10,323 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from NEW to INITING
> 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
> OPERATION=Start Container Request   TARGET=ContainerManageImpl  
> RESULT=SUCCESS  APPID=application_1428575950531_0021
> CONTAINERID=container_1428575950531_0021_01_01
> 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
> (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
> Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
> [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
> users.
> 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:(182)) - rollingMonitorInterval is set as 
> -1. The log rolling mornitoring interval is disabled. The logs will be 
> aggregated after this application is finished.
> 2015-04-29 00:41:10,351 INFO  application.Application 
> (ApplicationImpl.java:transition(304)) - Adding 
> container_1428575950531_0021_01_01 to application 
> application_1428575950531_0021
> 2015-04-29 00:41:10,352 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from INITING to RUNNING
> 2015-04-29 00:41:10,356 INFO  container.Container 
> (ContainerImpl.java:handle(999)) - Container 
> container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
> 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
> (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
> application_1428575950531_0021
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
> transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,3

[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: (was: YARN-2893.004.patch)

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch, YARN-2893.003.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: YARN-2893.004.patch

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519906#comment-14519906
 ] 

Naganarasimha G R commented on YARN-3044:
-

Thanks for the review [~djp] & [~sjlee0] lee],
bq. some of the issues reported by the jenkins build might be related to the 
patch?
Some might be but many(findbugs and testcase) are not related to this jira, 
hence planning to raise seperate jira to handle the same.
And some findbugs (like Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent ) not 
planning to handle as its same as earlier code & if checks doesnt make sense 
here

bq. l.147: how about using a simple for loop?
Well AFAIK it only affects readability here and had taken entry set iterator 
here as its generally preferred in terms of performance and concurrency (not 
relevance here). If you feel readability is a issue then can modify to simple 
loop :)

bq. l.179: curious, we could add them to the entity as metrics, right?
bq. So having them as events means that they should/will not be aggregated 
(e.g. from app => flow). Is that the intent with these values (CPU and cores)? 
I'm not exactly clear what these values indicate.
Well initially even though i had the same thoughts as that of [~djp], but it 
might be required to be aggregated (e.g. from app => flow) as its current value 
also is aggregation of all containers. 

As mentioned earlier, planning to raise jira for the following :
# To enhance TestSystemMetricsPublisherForV2 to ensure that test case verifies 
the published entity is populated as desired (similar to ATSV1).
# To add interface in TimelineClient to push application specific 
configurations as all are not captured as part of RM

Please provide your opinion.
Had one query, as earlier suggested by [~djp], where to add the util 
class(Package and classname) which converts the SystemEntities to Timeline 
entities and vice versa?
Also shall i handle this as part of this patch or 
TestSystemMetricsPublisherForV2 enhancement patch ?

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519911#comment-14519911
 ] 

Gour Saha commented on YARN-3561:
-

Slider stop command was called which initiates the Slider Storm application to 
stop (and hence the Slider AM to stop). 

Which property sets the keep-containers flag on?

> Non-AM Containers continue to run even after AM is stopped
> --
>
> Key: YARN-3561
> URL: https://issues.apache.org/jira/browse/YARN-3561
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.0
> Environment: debian 7
>Reporter: Gour Saha
>Priority: Critical
> Fix For: 2.6.1
>
>
> Non-AM containers continue to run even after application is stopped. This 
> occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
> Hadoop 2.6 deployment. 
> Following are the NM logs from 2 different nodes:
> *host-07* - where Slider AM was running
> *host-03* - where Storm NIMBUS container was running.
> *Note:* The logs are partial, starting with the time when the relevant Slider 
> AM and NIMBUS containers were allocated, till the time when the Slider AM was 
> stopped. Also, the large number of "Memory usage" log lines were removed 
> keeping only a few starts and ends of every segment.
> *NM log from host-07 where Slider AM container was running:*
> {noformat}
> 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
> container_1428575950531_0020_02_01
> 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
> Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
> container_1428575950531_0021_01_01 by user yarn
> 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
> application reference for app application_1428575950531_0021
> 2015-04-29 00:41:10,323 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from NEW to INITING
> 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
> OPERATION=Start Container Request   TARGET=ContainerManageImpl  
> RESULT=SUCCESS  APPID=application_1428575950531_0021
> CONTAINERID=container_1428575950531_0021_01_01
> 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
> (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
> Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
> [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
> users.
> 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:(182)) - rollingMonitorInterval is set as 
> -1. The log rolling mornitoring interval is disabled. The logs will be 
> aggregated after this application is finished.
> 2015-04-29 00:41:10,351 INFO  application.Application 
> (ApplicationImpl.java:transition(304)) - Adding 
> container_1428575950531_0021_01_01 to application 
> application_1428575950531_0021
> 2015-04-29 00:41:10,352 INFO  application.Application 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1428575950531_0021 transitioned from INITING to RUNNING
> 2015-04-29 00:41:10,356 INFO  container.Container 
> (ContainerImpl.java:handle(999)) - Container 
> container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
> 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
> (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
> application_1428575950531_0021
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
> (LocalizedResource.java:handle(203)) - Resource 
> hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
>  transitioned from INIT to DOWNLOADING
> 2015-04-29 00:41:10,358 INFO  localizer.LocalizedRe

[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3362:
-
Attachment: Screen Shot 2015-04-29 at 11.42.17 AM.png

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
> AM.png, YARN-3362.20150428-3.patch
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519917#comment-14519917
 ] 

Wangda Tan commented on YARN-3362:
--

Hi Naga,
Thanks for taking initiative for this, just tried to run the patch locally, 
looks great! Some comments:

1) Show partition=partition-name in every partition, if the partition is the 
NO_LABEL partition, show it's a YARN.DEFAULT.PARTITION.
2) I think it's better to show labels are not accessible, especially for the 
non-exclusive node label case, we can optimize this in future patch. To avoid 
people ask question like "where is my label"? This includes all existing "avoid 
displaying" items in your existing patch. But it's good to keep avoid showing 
"label" when there's no label in your cluster.
3) Showing partition of partition-specific queue metrics, they're:
- Used Capacity:0.0%
- Absolute Used Capacity:   0.0%
- Absolute Capacity:50.0%
- Absolute Max Capacity:100.0%
- Configured Capacity:  50.0%
- Configured Max Capacity:  100.0%
I suggest to add a (Partition=xxx) at the end of these metrics.

I attached queue hierarchy showing in my local cluster: 
https://issues.apache.org/jira/secure/attachment/12729256/Screen%20Shot%202015-04-29%20at%2011.42.17%20AM.png.
 It seems multi hierarchy works well in my environment.

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
> AM.png, YARN-3362.20150428-3.patch
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3562) unit tests fail with the failure to bring up node manager

2015-04-29 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3562:
-

 Summary: unit tests fail with the failure to bring up node manager
 Key: YARN-3562
 URL: https://issues.apache.org/jira/browse/YARN-3562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Minor


A bunch of MR unit tests are failing on our branch whenever the mini YARN 
cluster needs to bring up multiple node managers.

For example, see 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/

It is because the NMCollectorService is using a fixed port for the RPC (8048).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519960#comment-14519960
 ] 

Naganarasimha G R commented on YARN-3362:
-

Thanks [~wangda], for reviewing and testing the patch.

bq. partition=partition-name
Well i understand that in the later patches we are targetting it more as 
partition than labels, but in that case shall i modify the same in other 
locations of WEB like node labels page, in CS page shall i mark it as 
Accessible Partitions ?
bq. But it's good to keep avoid showing "label" when there's no label in your 
cluster.
you mean if no node is mapped to cluster node label then not to show that Node 
Label ?
bq. Showing partition of partition-specific queue metrics
you mean the existing names of metrics entries needs to be appended with 
(Partition=xxx) and not to show both right ?
bq. It seems multi hierarchy works well in my environment.
Its great to hear its working fine, but it worked without any modifications to 
the patch ? If so can you share offline your cluster setup (topology) with CS 
configuration, so that i can test it further.

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
> AM.png, YARN-3362.20150428-3.patch
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519964#comment-14519964
 ] 

Xuan Gong commented on YARN-3544:
-

Original, we are calling getContainerReport to AMContainer information (such as 
container log url, nm address, startTime, etc). It works fine when the 
Application is running, and the container is running. But when the application 
is finished, we will not keep finished container info. In that case, we could 
not get any finished container report from RM. That is why we see the AM logs 
link in web ui as "N/A" as well as other related attempt information.

In this patch, instead of querying from container Report, we directly get 
attempt(AM Container) information from AttemptInfo which is from RMAttempt. So, 
no matter the application is running or is finished, we could get related 
information and show them in the web ui

> AM logs link missing in the RM UI for a completed app 
> --
>
> Key: YARN-3544
> URL: https://issues.apache.org/jira/browse/YARN-3544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.0
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
> YARN-3544.1.patch
>
>
> AM log links should always be present ( for both running and completed apps).
> Likewise node info is also empty. This is usually quite crucial when trying 
> to debug where an AM was launched and a pointer to which NM's logs to look at 
> if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520074#comment-14520074
 ] 

Wangda Tan commented on YARN-3362:
--

bq. Well i understand that in the later patches we are targetting it more as 
partition than labels, but in that case shall i modify the same in other 
locations of WEB like node labels page, in CS page shall i mark it as 
Accessible Partitions ?
Good point, I think we may need keep it to be label, and do the renaming in a 
separated patch.

bq. in CS page shall i mark it as Accessible Partitions
We can keep calling it "label" avoid confusion.

bq. you mean if no node is mapped to cluster node label then not to show that 
Node Label ?
In my mind is, show all node labels no matter they mapped to nodes/queues or 
not. We can optimize this easily in the future, I prefer to keep completed 
message before people post their comments.

bq. you mean the existing names of metrics entries needs to be appended with 
(Partition=xxx) and not to show both right ?
I think we need to show both (partition-specific and queue general), the only 
change is append with (Node-Label=xxx).

bq. Its great to hear its working fine, but it worked without any modifications 
to the patch ?
Forgot to mention, I modified patch a little bit, removed some 
avoid-displaying-checking mentioned by you at 
https://issues.apache.org/jira/browse/YARN-3362?focusedCommentId=14517364&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14517364.
Uploading modified patch as well as CS config for you to test.

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
> AM.png, YARN-3362.20150428-3.patch
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3362:
-
Attachment: capacity-scheduler.xml
YARN-3362.20150428-3-modified.patch

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
> AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
> capacity-scheduler.xml
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520115#comment-14520115
 ] 

Li Lu commented on YARN-3411:
-

Thanks [~stack] for the quick info! Yes let's go with HBase 1. We can figure 
out a solution for Phoenix later. On the worst case, we can rely on the 
snapshot version of Phoenix, which already works with HBase 1. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520139#comment-14520139
 ] 

Hadoop QA commented on YARN-2893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   9m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m  9s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   2m  0s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  51m 53s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m  3s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729253/YARN-2893.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3dd6395 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/console |


This message was automatically generated.

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> --
>
> Key: YARN-2893
> URL: https://issues.apache.org/jira/browse/YARN-2893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: zhihai xu
> Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
> YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch
>
>
> MapReduce jobs on our clusters experience sporadic failures due to corrupt 
> tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520196#comment-14520196
 ] 

Vinod Kumar Vavilapalli commented on YARN-2868:
---

Going through old tickets. I have two questions
 # Why was this done in a scheduler specific way? RMAppAttempt clearly knows 
when it requests and when it gets the allocation.
 # Seems like the patch only looks at the first AM container. What happens if 
the we have a 2nd AM container?

I accidentally closed this ticket, so doesn't look like I can reopen it. If 
folks agree, I will open a new ticket.

> FairScheduler: Metric for latency to allocate first container for an 
> application
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Fix For: 2.8.0
>
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
> YARN-2868.012.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1317:
--
Target Version/s: 2.8.0  (was: )

I'd like to at the least get some of this done in the 2.8 time-frame..

> Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
> ---
>
> Key: YARN-1317
> URL: https://issues.apache.org/jira/browse/YARN-1317
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Today, we are duplicating the exact same code in all the schedulers. Queue is 
> a top class concept - clientService, web-services etc already recognize queue 
> as a top level concept.
> We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520214#comment-14520214
 ] 

Sangjin Lee commented on YARN-3051:
---

{quote}
My major concern about this proposal is compatibility. Previously in v1, 
timeline entity is globally unique, such that when fetching a single entity 
before, users only need to provide .  is required to locate one entity, and theoretically  will refer to multiple entities. It probably makes 
difficult to be compatible to existing use cases.
{quote}

To hash out that point, existing use cases which previously assumed that entity 
id was globally unique would continue to generate entity id's that are globally 
unique, right? Since existing use cases (w/o modification) would stick to 
globally unique entity id's in practice, redefining the uniqueness requirement 
to be in the scope of application should not impact existing use cases. Entity 
id's that are generated to be unique globally would trivially be unique within 
the application scope. The point here is that since this is in the direction of 
relaxing uniqueness, stricter use cases (existing use cases) should not be 
impacted. Let me know your thoughts.

IMO, stating that the entity id's are unique within the scope of applications 
is not an invitation for frameworks to generate tons of redundant entity id's. 
Frameworks (MR, tez, ...) would likely continue to generate entity id's that 
are practically unique globally anyway. But the part of the timeline service, 
we don't have to have checks for enforcing global uniqueness.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520226#comment-14520226
 ] 

Wangda Tan commented on YARN-3521:
--

Hi Sunil,
Thanks for working on this, some comments:

NodelabelsInfo: (It should be NodeLabelInfo, right?)
- nodeLabelName: don't need call {{new String()}} since it will be always 
initialized, and I prefer to call it "name"
- nodeLabelExclusivity -> exclusivity
- Also getter
- Setters are not used by anybody, could be removed 
- I'm not sure if you need add an empty constructure to make {{// JAXB needs 
this}} like other infos?
- Could add a constructor of NodeLabelsInfo receives NodeLabel which will be 
used by RMWebServices
- We may need to add a separated NodeLabelsInfo and it contains ArrayList of 
NodeLabelInfo

NodeToLabelsInfo -> NodeToLabelNames

addToClusterNodeLabels now receives Set as parameter, I'm not sure if it works, 
could you add test to verify add/get node labels? Now 
TestRMWebServicesNodeLabels will fail

> Support return structured NodeLabel objects in REST API when call 
> getClusterNodeLabels
> --
>
> Key: YARN-3521
> URL: https://issues.apache.org/jira/browse/YARN-3521
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-3521.patch
>
>
> In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
> make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520252#comment-14520252
 ] 

Thomas Graves commented on YARN-3517:
-

changes look good, +1.   thanks [~vvasudev]  

> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Thomas Graves
>Priority: Blocker
>  Labels: security
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3563:
-

 Summary: Completed app shows -1 running containers on RM web UI
 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


See the attached screenshot. I saw this issue with trunk. Not sure if it exists 
in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3563:
--
Attachment: Screen Shot 2015-04-29 at 2.11.19 PM.png

> Completed app shows -1 running containers on RM web UI
> --
>
> Key: YARN-3563
> URL: https://issues.apache.org/jira/browse/YARN-3563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Zhijie Shen
> Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png
>
>
> See the attached screenshot. I saw this issue with trunk. Not sure if it 
> exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3563:
--
Component/s: webapp
 resourcemanager

> Completed app shows -1 running containers on RM web UI
> --
>
> Key: YARN-3563
> URL: https://issues.apache.org/jira/browse/YARN-3563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Zhijie Shen
> Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png
>
>
> See the attached screenshot. I saw this issue with trunk. Not sure if it 
> exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3406) Display count of running containers in the RM's Web UI

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520273#comment-14520273
 ] 

Zhijie Shen commented on YARN-3406:
---

The web UI seems to have bug: YARN-3563

> Display count of running containers in the RM's Web UI
> --
>
> Key: YARN-3406
> URL: https://issues.apache.org/jira/browse/YARN-3406
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3406.1.patch, YARN-3406.2.patch, screenshot.png, 
> screenshot2.png
>
>
> Display the running containers in the all application list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520292#comment-14520292
 ] 

Jian He commented on YARN-3546:
---

[~sandflee], inside the scheduler, every application only has one attempt. so 
the current attempt is the attempt corresponding to the appAttemptId. So the 
name 'getAppAttempt(attemptId)' is matching with the internal implementation. 
If you agree, we can close this jira. 


> AbstractYarnScheduler.getApplicationAttempt seems misleading,  and there're 
> some misuse of it
> -
>
> Key: YARN-3546
> URL: https://issues.apache.org/jira/browse/YARN-3546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: sandflee
>
> I'm not familiar with scheduler,  with first eyes, I thought this func 
> returns the schdulerAppAttempt info corresponding to appAttemptId, but 
> actually it returns the current schdulerAppAttempt.
> It seems misled others too, such as
> TestWorkPreservingRMRestart.waitForNumContainersToRecover
> MockRM.waitForSchedulerAppAttemptAdded
> should I rename it to T getCurrentSchedulerApplicationAttempt(ApplicationId 
> applicationid)
> or returns null  if current attempt id not equals to the request attempt id ?
> comment preferred!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520291#comment-14520291
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic] [~zjshen], just a quick thing to confirm that we want to use 
byte arrays for config and info fields in both of our storage. I'll convert the 
type for config and info in the Phoenix implementation to VARBINARY to be 
consistent with this design. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520293#comment-14520293
 ] 

Jian He commented on YARN-3533:
---


patch looks good to me, thanks [~adhoot] ! 
hopefully this can resolve some intermittent failures we've seen recently.

> Test: Fix launchAM in MockRM to wait for attempt to be scheduled
> 
>
> Key: YARN-3533
> URL: https://issues.apache.org/jira/browse/YARN-3533
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3533.001.patch
>
>
> MockRM#launchAM fails in many test runs because it does not wait for the app 
> attempt to be scheduled before NM update is sent as noted in [recent 
> builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3551) Consolidate data model change according to the backend implementation

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520295#comment-14520295
 ] 

Sangjin Lee commented on YARN-3551:
---

I'm fine with going with using GenericOptionMapper for the 
serialization/deserialization of appropriate types. The generics is a 
suggestion for strengthening the types on the user side of things for the most 
part, so it may not be critical.

> Consolidate data model change according to the backend implementation
> -
>
> Key: YARN-3551
> URL: https://issues.apache.org/jira/browse/YARN-3551
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3551.1.patch, YARN-3551.2.patch, YARN-3551.3.patch
>
>
> Based on the comments on 
> [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
>  and 
> [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098],
>  we need to change the data model to restrict the data type of 
> info/config/metric section.
> 1. Info: the value could be all kinds object that is able to be 
> serialized/deserialized by jackson.
> 2. Config: the value will always be assumed as String.
> 3. Metric: single data or time series value have to be number for aggregation.
> Other than that, info/start time/finish time of metric seem not to be 
> necessary for storage. They should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-1876) Document the REST APIs of timeline and generic history services

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-1876:
---

> Document the REST APIs of timeline and generic history services
> ---
>
> Key: YARN-1876
> URL: https://issues.apache.org/jira/browse/YARN-1876
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: documentaion
> Attachments: YARN-1876.1.patch, YARN-1876.2.patch, YARN-1876.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1876) Document the REST APIs of timeline and generic history services

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1876.
---
Resolution: Duplicate

Duplicate is the right resolution.

> Document the REST APIs of timeline and generic history services
> ---
>
> Key: YARN-1876
> URL: https://issues.apache.org/jira/browse/YARN-1876
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: documentaion
> Attachments: YARN-1876.1.patch, YARN-1876.2.patch, YARN-1876.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520308#comment-14520308
 ] 

Hudson commented on YARN-3517:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7701 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7701/])
YARN-3517. RM web ui for dumping scheduler logs should be for admins only 
(Varun Vasudev via tgraves) (tgraves: rev 
2e215484bd05cd5e3b7a81d3558c6879a05dd2d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* hadoop-yarn-project/CHANGES.txt


> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Thomas Graves
>Priority: Blocker
>  Labels: security
> Fix For: 2.8.0
>
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520312#comment-14520312
 ] 

Vinod Kumar Vavilapalli commented on YARN-3539:
---

bq. So I'm not sure if it's good timeline now, as we foresee in the near 
future, we're going to be upgraded to ATS v2, which may significantly refurnish 
the APIs.
How about we simply say that people can continue to run the v1 Timeline Service 
(Single server backed by LevelDB) beyond Timeline Service next-gen? That way, 
older installations and apps can continue to use the old APIs, and the new APIs 
do not need to take the unknown burden of making the old APIs work on the newer 
framework.

> Compatibility doc to state that ATS v1 is a stable REST API
> ---
>
> Key: YARN-3539
> URL: https://issues.apache.org/jira/browse/YARN-3539
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
> YARN-3539-003.patch, YARN-3539-004.patch
>
>
> The ATS v2 discussion and YARN-2423 have raised the question: "how stable are 
> the ATSv1 APIs"?
> The existing compatibility document actually states that the History Server 
> is [a stable REST 
> API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
>  which effectively means that ATSv1 has already been declared as a stable API.
> Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520317#comment-14520317
 ] 

Jason Lowe commented on YARN-3563:
--

This sounds closely related to, if not a duplicate of, YARN-3552.

> Completed app shows -1 running containers on RM web UI
> --
>
> Key: YARN-3563
> URL: https://issues.apache.org/jira/browse/YARN-3563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Zhijie Shen
> Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png
>
>
> See the attached screenshot. I saw this issue with trunk. Not sure if it 
> exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520319#comment-14520319
 ] 

Vinod Kumar Vavilapalli commented on YARN-3539:
---

In a way, I am saying that there will be v1 end-points and v2 end-points. V1 
end-points go to the old Timeline Service and V2 end-points go to the next-gen 
Timeline Service.

> Compatibility doc to state that ATS v1 is a stable REST API
> ---
>
> Key: YARN-3539
> URL: https://issues.apache.org/jira/browse/YARN-3539
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
> YARN-3539-003.patch, YARN-3539-004.patch
>
>
> The ATS v2 discussion and YARN-2423 have raised the question: "how stable are 
> the ATSv1 APIs"?
> The existing compatibility document actually states that the History Server 
> is [a stable REST 
> API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
>  which effectively means that ATSv1 has already been declared as a stable API.
> Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520322#comment-14520322
 ] 

Sangjin Lee commented on YARN-3044:
---

{quote}
Some might be but many(findbugs and testcase) are not related to this jira, 
hence planning to raise seperate jira to handle the same.
And some findbugs (like Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent ) not 
planning to handle as its same as earlier code & if checks doesnt make sense 
here
{quote}
Understood. We should try to resolve the ones that make sense but don't have to 
be pedantic. By the way, note that I filed a separate JIRA for the unit test 
issues that already exist on YARN-2928 (YARN-3562).

{quote}
Well AFAIK it only affects readability here and had taken entry set iterator 
here as its generally preferred in terms of performance and concurrency (not 
relevance here). If you feel readability is a issue then can modify to simple 
loop 
{quote}
That's fine. It was a style nit (if that wasn't clear).

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520328#comment-14520328
 ] 

Vinod Kumar Vavilapalli commented on YARN-3477:
---

This looks good to me. [~zjshen], can you look and do the honors?

> TimelineClientImpl swallows exceptions
> --
>
> Key: YARN-3477
> URL: https://issues.apache.org/jira/browse/YARN-3477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-3477-001.patch, YARN-3477-002.patch
>
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-3517:
-

Assignee: Varun Vasudev  (was: Thomas Graves)

Seems like the JIRA assignee got mixed up, fixing..

> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: security
> Fix For: 2.8.0
>
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520332#comment-14520332
 ] 

Sangjin Lee commented on YARN-3045:
---

Hi [~Naganarasimha], I do have one quick question on the naming. I see a lot of 
names that include "metrics", such as NMMetricsPublisher, NMMetricsEvent, 
NMMetricsEventType, and so on. And yet, they don't seem to involve metrics in 
the sense of timeline metrics. This is a source of confusion to me. Do we need 
"metrics" in these? They seem to be capturing purely lifecycle events. Could we 
change them to better names?

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3045.20150420-1.patch
>
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520339#comment-14520339
 ] 

Jian He commented on YARN-3533:
---

bq. getApplicationAttempt seems confusing, I just opened 
https://issues.apache.org/jira/browse/YARN-3546 to discuss this
I replied on the jira.

The TestContainerAllocation failure is unrelated to this patch. opening a new 
jira to fix that.

committing this.

> Test: Fix launchAM in MockRM to wait for attempt to be scheduled
> 
>
> Key: YARN-3533
> URL: https://issues.apache.org/jira/browse/YARN-3533
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3533.001.patch
>
>
> MockRM#launchAM fails in many test runs because it does not wait for the app 
> attempt to be scheduled before NM update is sent as noted in [recent 
> builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520346#comment-14520346
 ] 

Jian He commented on YARN-3533:
---

committed to trunk and branch-2,  thanks Anubhav !
Thanks [~sandflee], [~rohithsharma] for the review !

> Test: Fix launchAM in MockRM to wait for attempt to be scheduled
> 
>
> Key: YARN-3533
> URL: https://issues.apache.org/jira/browse/YARN-3533
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3533.001.patch
>
>
> MockRM#launchAM fails in many test runs because it does not wait for the app 
> attempt to be scheduled before NM update is sent as noted in [recent 
> builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)
Jian He created YARN-3564:
-

 Summary: 
TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
randomly 
 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3564:
--
Description: the test fails intermittently in jenkins 
https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/  (was: 
https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/)

> TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
> randomly 
> ---
>
> Key: YARN-3564
> URL: https://issues.apache.org/jira/browse/YARN-3564
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>
> the test fails intermittently in jenkins 
> https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520355#comment-14520355
 ] 

Thomas Graves commented on YARN-3517:
-

thanks [~vinodkv] I missed that.

> RM web ui for dumping scheduler logs should be for admins only
> --
>
> Key: YARN-3517
> URL: https://issues.apache.org/jira/browse/YARN-3517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: security
> Fix For: 2.8.0
>
> Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
> YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
> YARN-3517.006.patch
>
>
> YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
> for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3564:
--
Description: 
https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/

> TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
> randomly 
> ---
>
> Key: YARN-3564
> URL: https://issues.apache.org/jira/browse/YARN-3564
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>
> https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520365#comment-14520365
 ] 

Zhijie Shen commented on YARN-3544:
---

Xuan, thanks for the patch. I've tried your patch locally, and it brought the 
content back to the web UI. However, I've one concern. It seems that the link 
to the local log on NM is not useful after the app is finished, because the log 
is not supposed to be there any longer. So is this jira supposed to fix the 
regression, or ultimately provide a useful link to AM container log? Those seem 
to be different goals.

/cc [~hitesh]

> AM logs link missing in the RM UI for a completed app 
> --
>
> Key: YARN-3544
> URL: https://issues.apache.org/jira/browse/YARN-3544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.0
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
> YARN-3544.1.patch
>
>
> AM log links should always be present ( for both running and completed apps).
> Likewise node info is also empty. This is usually quite crucial when trying 
> to debug where an AM was launched and a pointer to which NM's logs to look at 
> if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520378#comment-14520378
 ] 

Hitesh Shah commented on YARN-3544:
---

Doesnt the NM log link redirect the log server after the logs have been 
aggregated? 

> AM logs link missing in the RM UI for a completed app 
> --
>
> Key: YARN-3544
> URL: https://issues.apache.org/jira/browse/YARN-3544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.0
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
> YARN-3544.1.patch
>
>
> AM log links should always be present ( for both running and completed apps).
> Likewise node info is also empty. This is usually quite crucial when trying 
> to debug where an AM was launched and a pointer to which NM's logs to look at 
> if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520380#comment-14520380
 ] 

Hudson commented on YARN-3533:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7702 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7702/])
YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. 
Contributed by Anubhav Dhoot (jianhe: rev 
4c1af156aef4f3bb1d9823d5980c59b12007dc77)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


> Test: Fix launchAM in MockRM to wait for attempt to be scheduled
> 
>
> Key: YARN-3533
> URL: https://issues.apache.org/jira/browse/YARN-3533
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3533.001.patch
>
>
> MockRM#launchAM fails in many test runs because it does not wait for the app 
> attempt to be scheduled before NM update is sent as noted in [recent 
> builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520379#comment-14520379
 ] 

Hitesh Shah commented on YARN-3544:
---

I meant "redirect to the log server" 

> AM logs link missing in the RM UI for a completed app 
> --
>
> Key: YARN-3544
> URL: https://issues.apache.org/jira/browse/YARN-3544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.0
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
> YARN-3544.1.patch
>
>
> AM log links should always be present ( for both running and completed apps).
> Likewise node info is also empty. This is usually quite crucial when trying 
> to debug where an AM was launched and a pointer to which NM's logs to look at 
> if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3564:
--
Attachment: YARN-3564.1.patch

patch to fix the failure

> TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
> randomly 
> ---
>
> Key: YARN-3564
> URL: https://issues.apache.org/jira/browse/YARN-3564
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3564.1.patch
>
>
> the test fails intermittently in jenkins 
> https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520396#comment-14520396
 ] 

Vinod Kumar Vavilapalli commented on YARN-3445:
---

There is a too much of duplicate information already in NodeHeartbeatRequest, 
albeit for slightly different purposes. We need to consolidate the following 
(without breaking compatibility of previous releases), lest the heartbeat will 
become heavier and heavier.
 - logAggregationReportsForApps added, but not released yet
-- logAggregationReportsForApps itself is a map of ApplicationID with a 
nested LogAggregationReport.ApplicationID - duplicate AppID information
 - runningApplications in this patch
 - NodeStatus.keepAliveApplications

/cc [~jianhe] [~leftnoteasy]

> Cache runningApps in RMNode for getting running apps on given NodeId
> 
>
> Key: YARN-3445
> URL: https://issues.apache.org/jira/browse/YARN-3445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3445-v2.patch, YARN-3445.patch
>
>
> Per discussion in YARN-3334, we need filter out unnecessary collectors info 
> from RM in heartbeat response. Our propose is to add cache for runningApps in 
> RMNode, so RM only send collectors for local running apps back. This is also 
> needed in YARN-914 (graceful decommission) that if no running apps in NM 
> which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3392) Change NodeManager metrics to not populate resource usage metrics if they are unavailable

2015-04-29 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot resolved YARN-3392.
-
Resolution: Duplicate

> Change NodeManager metrics to not populate resource usage metrics if they are 
> unavailable 
> --
>
> Key: YARN-3392
> URL: https://issues.apache.org/jira/browse/YARN-3392
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3392.prelim.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-04-29 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3565:


 Summary: NodeHeartbeatRequest/RegisterNodeManagerRequest should 
use NodeLabel object instead of String
 Key: YARN-3565
 URL: https://issues.apache.org/jira/browse/YARN-3565
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker


Now NM HB/Register uses Set, it will be hard to add new fields if we 
want to support specifying NodeLabel type such as exclusivity/constraints, etc. 
We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520425#comment-14520425
 ] 

Zhijie Shen commented on YARN-3544:
---

bq. Doesnt the NM log link redirect the log server after the logs have been 
aggregated?

Thanks, Hitesh! I didn't notice this option before. Tried it locally, and the 
whole process of the completed log is working fine now.

Will commit the patch late today unless there's further comment.

> AM logs link missing in the RM UI for a completed app 
> --
>
> Key: YARN-3544
> URL: https://issues.apache.org/jira/browse/YARN-3544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.0
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
> YARN-3544.1.patch
>
>
> AM log links should always be present ( for both running and completed apps).
> Likewise node info is also empty. This is usually quite crucial when trying 
> to debug where an AM was launched and a pointer to which NM's logs to look at 
> if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3473) Fix RM Web UI configuration for some properties

2015-04-29 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3473:
-
Labels: BB2015-05-TBR  (was: )

> Fix RM Web UI configuration for some properties
> ---
>
> Key: YARN-3473
> URL: https://issues.apache.org/jira/browse/YARN-3473
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: YARN-3473.001.patch
>
>
> Using the RM Web UI, the Tools->Configuration page shows some properties as 
> something like "BufferedInputStream" instead of the appropriate .xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520473#comment-14520473
 ] 

Zhijie Shen commented on YARN-3539:
---

bq. That way, older installations and apps can continue to use the old APIs, 
and the new APIs do not need to take the unknown burden of making the old APIs 
work on the newer framework.

This sounds a more reasonable commitment for ATS v2.

> Compatibility doc to state that ATS v1 is a stable REST API
> ---
>
> Key: YARN-3539
> URL: https://issues.apache.org/jira/browse/YARN-3539
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
> YARN-3539-003.patch, YARN-3539-004.patch
>
>
> The ATS v2 discussion and YARN-2423 have raised the question: "how stable are 
> the ATSv1 APIs"?
> The existing compatibility document actually states that the History Server 
> is [a stable REST 
> API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
>  which effectively means that ATSv1 has already been declared as a stable API.
> Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >