[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.1.patch

Attaching the patch
kindly review

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public

2015-04-29 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519137#comment-14519137
 ] 

J.Andreina commented on YARN-3559:
--

[~ste...@apache.org] ,I would like to work on this issue. If you have not 
already started working on this , shall i take this issue?

 Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public
 

 Key: YARN-3559
 URL: https://issues.apache.org/jira/browse/YARN-3559
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Steve Loughran

 {{org.apache.hadoop.security.token.Token}} is tagged 
 {{@InterfaceAudience.LimitedPrivate}} for HDFS and MapReduce.
 However, it is used throughout YARN apps, where both the clients and the AM 
 need to work with tokens. This class and related all need to be declared 
 public. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-04-29 Thread Peng Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peng Zhang updated YARN-3535:
-
Attachment: YARN-3535-002.patch

# Remove call of recoverResourceRequestForContainer from preemption to avoid 
duplication of recover RR.
# Fix broken tests.

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
 Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
 yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public

2015-04-29 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-3559:


 Summary: Mark org.apache.hadoop.security.token.Token as 
@InterfaceAudience.Public
 Key: YARN-3559
 URL: https://issues.apache.org/jira/browse/YARN-3559
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Steve Loughran


{{org.apache.hadoop.security.token.Token}} is tagged 
{{@InterfaceAudience.LimitedPrivate}} for HDFS and MapReduce.

However, it is used throughout YARN apps, where both the clients and the AM 
need to work with tokens. This class and related all need to be declared 
public. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519196#comment-14519196
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
 FairShare policies
 

 Key: YARN-3485
 URL: https://issues.apache.org/jira/browse/YARN-3485
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.7.1

 Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
 yarn-3485-prelim.patch


 FairScheduler's headroom calculations consider the fairshare and 
 cluster-available-resources, and the fairshare has maxResources. However, for 
 Fifo and Fairshare policies, the fairshare is used only for memory and not 
 cpu. So, the scheduler ends up showing a higher headroom than is actually 
 available. This could lead to applications waiting for resources far longer 
 than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519222#comment-14519222
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


 FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
 FairShare policies
 

 Key: YARN-3485
 URL: https://issues.apache.org/jira/browse/YARN-3485
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.7.1

 Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
 yarn-3485-prelim.patch


 FairScheduler's headroom calculations consider the fairshare and 
 cluster-available-resources, and the fairshare has maxResources. However, for 
 Fifo and Fairshare policies, the fairshare is used only for memory and not 
 cpu. So, the scheduler ends up showing a higher headroom than is actually 
 available. This could lead to applications waiting for resources far longer 
 than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519247#comment-14519247
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


 FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
 FairShare policies
 

 Key: YARN-3485
 URL: https://issues.apache.org/jira/browse/YARN-3485
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.7.1

 Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
 yarn-3485-prelim.patch


 FairScheduler's headroom calculations consider the fairshare and 
 cluster-available-resources, and the fairshare has maxResources. However, for 
 Fifo and Fairshare policies, the fairshare is used only for memory and not 
 cpu. So, the scheduler ends up showing a higher headroom than is actually 
 available. This could lead to applications waiting for resources far longer 
 than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3560) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job

2015-04-29 Thread Anushri (JIRA)
Anushri created YARN-3560:
-

 Summary: Not able to navigate to the cluster from tracking url 
(proxy) generated after submission of job
 Key: YARN-3560
 URL: https://issues.apache.org/jira/browse/YARN-3560
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anushri
Priority: Minor


a standalone web proxy server is enabled in the cluster
when a job is submitted the url generated contains proxy
track this url
in the web page , if we try to navigate to the cluster links [about. 
applications, or scheduler] it gets redirected to some default port instead of 
actual RM web port configured
as such it throws webpage not available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519326#comment-14519326
 ] 

Hadoop QA commented on YARN-3271:
-

(!) The patch artifact directory on has been removed! 
This is a fatal error for test-patch.sh.  Aborting. 
Jenkins (node H4) information at 
https://builds.apache.org/job/PreCommit-YARN-Build/7537/ may provide some hints.

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519278#comment-14519278
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/912/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


 FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
 FairShare policies
 

 Key: YARN-3485
 URL: https://issues.apache.org/jira/browse/YARN-3485
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.7.1

 Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
 yarn-3485-prelim.patch


 FairScheduler's headroom calculations consider the fairshare and 
 cluster-available-resources, and the fairshare has maxResources. However, for 
 Fifo and Fairshare policies, the fairshare is used only for memory and not 
 cpu. So, the scheduler ends up showing a higher headroom than is actually 
 available. This could lead to applications waiting for resources far longer 
 than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: YARN-2893.004.patch

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518761#comment-14518761
 ] 

Sangjin Lee commented on YARN-3044:
---

[~zjshen], sorry I missed your comment earlier...

bq. Say we have a big cluster that can afford 5,000 concurrent containers...

I follow your logic there. But I meant 5,000 containers allocated *per second*, 
not 5,000 concurrent containers. In a large cluster, it is entirely possible 
that containers are allocated and released on the order of thousands per second 
easily. Then, it follows we're already talking about 2 * 5,000 events per 
second in such a situation. And if we add more event types it is reasonable to 
expect each of them to happen as fast as 5,000 events per second.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps

2015-04-29 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518781#comment-14518781
 ] 

Xianyin Xin commented on YARN-2176:
---

Sorry [~jlowe], i've made a mistake. What i thought was Fair, where we resort 
all the apps when we make scheduling. When the number of the running apps is 
thousands, the time consume for resorting is hundreds of milliseconds. You're 
right that the overhead in CS is low.

 CapacityScheduler loops over all running applications rather than actively 
 requesting apps
 --

 Key: YARN-2176
 URL: https://issues.apache.org/jira/browse/YARN-2176
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.4.0
Reporter: Jason Lowe

 The capacity scheduler performance is primarily dominated by 
 LeafQueue.assignContainers, and that currently loops over all applications 
 that are running in the queue.  It would be more efficient if we looped over 
 just the applications that are actively asking for resources rather than all 
 applications, as there could be thousands of applications running but only a 
 few hundred that are currently asking for resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518917#comment-14518917
 ] 

zhihai xu commented on YARN-2893:
-

The TestAMRestart failure is not related to my change. YARN-2483 is for this 
test failure.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3552) RMServerUtils#DUMMY_APPLICATION_RESOURCE_USAGE_REPORT has negative numbers

2015-04-29 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519096#comment-14519096
 ] 

Rohith commented on YARN-3552:
--

I think for UI dispaly 'N/A' is resonable, and REST we should keep existing 
behavior since it affect compatibility.

 RMServerUtils#DUMMY_APPLICATION_RESOURCE_USAGE_REPORT  has negative numbers
 ---

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519027#comment-14519027
 ] 

Steve Loughran commented on YARN-3539:
--

bq.  we need to update all the API classes to remark them stable.

Good point. My next patch will tag the relevant classes as @Evolving. 

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler

2015-04-29 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3558:
--

 Summary: Additional containers getting reserved from RM in case of 
Fair scheduler
 Key: YARN-3558
 URL: https://issues.apache.org/jira/browse/YARN-3558
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.7.0
 Environment: OS :Suse 11 Sp3
Setup : 2 RM 2 NM
Scheduler : Fair scheduler

Reporter: Bibin A Chundatt


Submit PI job with 16 maps
Total container expected : 16 MAPS + 1 Reduce  + 1 AM
Total containers reserved by RM is 21

Below set of containers are not being used for execution

container_1430213948957_0001_01_20
container_1430213948957_0001_01_19


RM Containers reservation and states
{code}
 Processing container_1430213948957_0001_01_01 of type START
 Processing container_1430213948957_0001_01_01 of type ACQUIRED
 Processing container_1430213948957_0001_01_01 of type LAUNCHED
 Processing container_1430213948957_0001_01_02 of type START
 Processing container_1430213948957_0001_01_03 of type START
 Processing container_1430213948957_0001_01_02 of type ACQUIRED
 Processing container_1430213948957_0001_01_03 of type ACQUIRED
 Processing container_1430213948957_0001_01_04 of type START
 Processing container_1430213948957_0001_01_05 of type START
 Processing container_1430213948957_0001_01_04 of type ACQUIRED
 Processing container_1430213948957_0001_01_05 of type ACQUIRED
 Processing container_1430213948957_0001_01_02 of type LAUNCHED
 Processing container_1430213948957_0001_01_04 of type LAUNCHED
 Processing container_1430213948957_0001_01_06 of type RESERVED
 Processing container_1430213948957_0001_01_03 of type LAUNCHED
 Processing container_1430213948957_0001_01_05 of type LAUNCHED
 Processing container_1430213948957_0001_01_07 of type START
 Processing container_1430213948957_0001_01_07 of type ACQUIRED
 Processing container_1430213948957_0001_01_07 of type LAUNCHED
 Processing container_1430213948957_0001_01_08 of type RESERVED
 Processing container_1430213948957_0001_01_02 of type FINISHED
 Processing container_1430213948957_0001_01_06 of type START
 Processing container_1430213948957_0001_01_06 of type ACQUIRED
 Processing container_1430213948957_0001_01_06 of type LAUNCHED
 Processing container_1430213948957_0001_01_04 of type FINISHED
 Processing container_1430213948957_0001_01_09 of type START
 Processing container_1430213948957_0001_01_09 of type ACQUIRED
 Processing container_1430213948957_0001_01_09 of type LAUNCHED
 Processing container_1430213948957_0001_01_10 of type RESERVED
 Processing container_1430213948957_0001_01_03 of type FINISHED
 Processing container_1430213948957_0001_01_08 of type START
 Processing container_1430213948957_0001_01_08 of type ACQUIRED
 Processing container_1430213948957_0001_01_08 of type LAUNCHED
 Processing container_1430213948957_0001_01_05 of type FINISHED
 Processing container_1430213948957_0001_01_11 of type START
 Processing container_1430213948957_0001_01_11 of type ACQUIRED
 Processing container_1430213948957_0001_01_11 of type LAUNCHED
 Processing container_1430213948957_0001_01_07 of type FINISHED
 Processing container_1430213948957_0001_01_12 of type START
 Processing container_1430213948957_0001_01_12 of type ACQUIRED
 Processing container_1430213948957_0001_01_12 of type LAUNCHED
 Processing container_1430213948957_0001_01_13 of type RESERVED
 Processing container_1430213948957_0001_01_06 of type FINISHED
 Processing container_1430213948957_0001_01_10 of type START
 Processing container_1430213948957_0001_01_10 of type ACQUIRED
 Processing container_1430213948957_0001_01_10 of type LAUNCHED
 Processing container_1430213948957_0001_01_09 of type FINISHED
 Processing container_1430213948957_0001_01_14 of type START
 Processing container_1430213948957_0001_01_14 of type ACQUIRED
 Processing container_1430213948957_0001_01_14 of type LAUNCHED
 Processing container_1430213948957_0001_01_15 of type RESERVED
 Processing container_1430213948957_0001_01_08 of type FINISHED
 Processing container_1430213948957_0001_01_13 of type START
 Processing container_1430213948957_0001_01_16 of type RESERVED
 Processing container_1430213948957_0001_01_13 of type ACQUIRED
 Processing container_1430213948957_0001_01_13 of type LAUNCHED
 Processing container_1430213948957_0001_01_11 of type FINISHED
 Processing container_1430213948957_0001_01_16 of type START
 Processing container_1430213948957_0001_01_10 of type FINISHED
 Processing container_1430213948957_0001_01_15 of type START
 Processing container_1430213948957_0001_01_16 of type ACQUIRED
 Processing container_1430213948957_0001_01_15 of 

[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: (was: YARN-2893.004.patch)

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518760#comment-14518760
 ] 

Sangjin Lee commented on YARN-3044:
---

It looks like some of the issues reported by the jenkins build might be related 
to the patch? It would be great if you could look into them and see if we can 
resolve them.

Some additional comments:

(RMContainerEntity.java)
- l.28-29: NM - RM

(TimelineServiceV2Publisher.java)
- l.141: I would prefer explicit entity.setQueue() over setting the info 
directly. Although it is currently equivalent, we should stick with the high 
level methods we introduced and that would be robust even if we should change 
how the queue is set.
- l.147: how about using a simple for loop?
- l.179: curious, we could add them to the entity as metrics, right?
- l.300: unnecessary line?


 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3484) Fix up yarn top shell code

2015-04-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3484:

Attachment: YARN-3484.002.patch

bq. variables that are local to a function should be declared local.
Fixed.

bq. avoid using mixed case as per the shell programming guidelines
Fixed.

bq. yarnTopArgs is effectively a global. It should either get renamed to 
YARN_foo or another to not pollute the shell name space or another approach is 
process set_yarn_top_args as a subshell, reading its input directly to avoid 
the global entirely
Fixed; renamed it to YARN_TOP_ARGS.

bq. set_yarn_top_args should be hadoop_ something so as to not pollute the 
shell name space
Fixed; changed the name to hadoop_set_yarn_top_args

 Fix up yarn top shell code
 --

 Key: YARN-3484
 URL: https://issues.apache.org/jira/browse/YARN-3484
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Varun Vasudev
 Attachments: YARN-3484.001.patch, YARN-3484.002.patch


 We need to do some work on yarn top's shell code.
 a) Just checking for TERM isn't good enough.  We really need to check the 
 return on tput, especially since the output will not be a number but an error 
 string which will likely blow up the java code in horrible ways.
 b) All the single bracket tests should be double brackets to force the bash 
 built-in.
 c) I'd think I'd rather see the shell portion in a function since it's rather 
 large.  This will allow for args, etc, to get local'ized and clean up the 
 case statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3554:

Attachment: YARN-3554-20150429-2.patch

Hi [~jlowe] Updating with 3 minutes as the timeout

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519568#comment-14519568
 ] 

Hudson commented on YARN-3485:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/])
YARN-3485. FairScheduler headroom calculation doesn't consider maxResources for 
Fifo and FairShare policies. (kasha) (kasha: rev 
8f82970e0c247b37b2bf33aa21f6a39afa07efde)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java


 FairScheduler headroom calculation doesn't consider maxResources for Fifo and 
 FairShare policies
 

 Key: YARN-3485
 URL: https://issues.apache.org/jira/browse/YARN-3485
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.7.1

 Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-3.patch, 
 yarn-3485-prelim.patch


 FairScheduler's headroom calculations consider the fairshare and 
 cluster-available-resources, and the fairshare has maxResources. However, for 
 Fifo and Fairshare policies, the fairshare is used only for memory and not 
 cpu. So, the scheduler ends up showing a higher headroom than is actually 
 available. This could lead to applications waiting for resources far longer 
 than then intend to. e.g. MAPREDUCE-6302.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519576#comment-14519576
 ] 

Naganarasimha G R commented on YARN-3554:
-

Agree with you [~jlowe], but what do you feel the ideal timeout should be, 3 
mins /  5 mins ? May be as you guys would have better experience with large 
number of nodes and see frequent NM failures you can suggest a better value 
here .


 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519603#comment-14519603
 ] 

Jason Lowe commented on YARN-3554:
--

I suggest we go with 3 minutes.  The retry interval is 10 seconds, so we'll get 
plenty of retries in that time if the failure is fast (e.g.: unknown host, 
connection refused) and still get a few retries in if the failure is slow 
(e.g.: connection timeout).

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3362:
-
Attachment: capacity-scheduler.xml
YARN-3362.20150428-3-modified.patch

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
 capacity-scheduler.xml


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520074#comment-14520074
 ] 

Wangda Tan commented on YARN-3362:
--

bq. Well i understand that in the later patches we are targetting it more as 
partition than labels, but in that case shall i modify the same in other 
locations of WEB like node labels page, in CS page shall i mark it as 
Accessible Partitions ?
Good point, I think we may need keep it to be label, and do the renaming in a 
separated patch.

bq. in CS page shall i mark it as Accessible Partitions
We can keep calling it label avoid confusion.

bq. you mean if no node is mapped to cluster node label then not to show that 
Node Label ?
In my mind is, show all node labels no matter they mapped to nodes/queues or 
not. We can optimize this easily in the future, I prefer to keep completed 
message before people post their comments.

bq. you mean the existing names of metrics entries needs to be appended with 
(Partition=xxx) and not to show both right ?
I think we need to show both (partition-specific and queue general), the only 
change is append with (Node-Label=xxx).

bq. Its great to hear its working fine, but it worked without any modifications 
to the patch ?
Forgot to mention, I modified patch a little bit, removed some 
avoid-displaying-checking mentioned by you at 
https://issues.apache.org/jira/browse/YARN-3362?focusedCommentId=14517364page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14517364.
Uploading modified patch as well as CS config for you to test.

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
 AM.png, YARN-3362.20150428-3.patch


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520115#comment-14520115
 ] 

Li Lu commented on YARN-3411:
-

Thanks [~stack] for the quick info! Yes let's go with HBase 1. We can figure 
out a solution for Phoenix later. On the worst case, we can rely on the 
snapshot version of Phoenix, which already works with HBase 1. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520139#comment-14520139
 ] 

Hadoop QA commented on YARN-2893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   9m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m  9s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   2m  0s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  51m 53s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m  3s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729253/YARN-2893.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3dd6395 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7544/console |


This message was automatically generated.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519848#comment-14519848
 ] 

Vinod Kumar Vavilapalli commented on YARN-3561:
---

Is this because the keep-containers flag is on? Why was the AM stopped and not 
the the app killed if that is what they want.

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
Reporter: Gour Saha
Priority: Critical

 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
 transitioned from 

[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-29 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3521:
--
Attachment: 0001-YARN-3521.patch

Attaching an initial version. [~leftnoteasy] pls check the same as I have the 
changed the method interface of *getClusterNodeLabels* and 
*addToClusterNodeLabels* to pass argument to *ListNodeLabelInfo*.


 Support return structured NodeLabel objects in REST API when call 
 getClusterNodeLabels
 --

 Key: YARN-3521
 URL: https://issues.apache.org/jira/browse/YARN-3521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-3521.patch


 In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
 make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519906#comment-14519906
 ] 

Naganarasimha G R commented on YARN-3044:
-

Thanks for the review [~djp]  [~sjlee0] lee],
bq. some of the issues reported by the jenkins build might be related to the 
patch?
Some might be but many(findbugs and testcase) are not related to this jira, 
hence planning to raise seperate jira to handle the same.
And some findbugs (like Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent ) not 
planning to handle as its same as earlier code  if checks doesnt make sense 
here

bq. l.147: how about using a simple for loop?
Well AFAIK it only affects readability here and had taken entry set iterator 
here as its generally preferred in terms of performance and concurrency (not 
relevance here). If you feel readability is a issue then can modify to simple 
loop :)

bq. l.179: curious, we could add them to the entity as metrics, right?
bq. So having them as events means that they should/will not be aggregated 
(e.g. from app = flow). Is that the intent with these values (CPU and cores)? 
I'm not exactly clear what these values indicate.
Well initially even though i had the same thoughts as that of [~djp], but it 
might be required to be aggregated (e.g. from app = flow) as its current value 
also is aggregation of all containers. 

As mentioned earlier, planning to raise jira for the following :
# To enhance TestSystemMetricsPublisherForV2 to ensure that test case verifies 
the published entity is populated as desired (similar to ATSV1).
# To add interface in TimelineClient to push application specific 
configurations as all are not captured as part of RM

Please provide your opinion.
Had one query, as earlier suggested by [~djp], where to add the util 
class(Package and classname) which converts the SystemEntities to Timeline 
entities and vice versa?
Also shall i handle this as part of this patch or 
TestSystemMetricsPublisherForV2 enhancement patch ?

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519917#comment-14519917
 ] 

Wangda Tan commented on YARN-3362:
--

Hi Naga,
Thanks for taking initiative for this, just tried to run the patch locally, 
looks great! Some comments:

1) Show partition=partition-name in every partition, if the partition is the 
NO_LABEL partition, show it's a YARN.DEFAULT.PARTITION.
2) I think it's better to show labels are not accessible, especially for the 
non-exclusive node label case, we can optimize this in future patch. To avoid 
people ask question like where is my label? This includes all existing avoid 
displaying items in your existing patch. But it's good to keep avoid showing 
label when there's no label in your cluster.
3) Showing partition of partition-specific queue metrics, they're:
- Used Capacity:0.0%
- Absolute Used Capacity:   0.0%
- Absolute Capacity:50.0%
- Absolute Max Capacity:100.0%
- Configured Capacity:  50.0%
- Configured Max Capacity:  100.0%
I suggest to add a (Partition=xxx) at the end of these metrics.

I attached queue hierarchy showing in my local cluster: 
https://issues.apache.org/jira/secure/attachment/12729256/Screen%20Shot%202015-04-29%20at%2011.42.17%20AM.png.
 It seems multi hierarchy works well in my environment.

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
 AM.png, YARN-3362.20150428-3.patch


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: YARN-2893.004.patch

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation

2015-04-29 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519814#comment-14519814
 ] 

Sidharta Seethana commented on YARN-2619:
-

[~vvasudev] , thanks for refactoring the test to be cleaner. The corresponding 
changes seem good to me. 

 NodeManager: Add cgroups support for disk I/O isolation
 ---

 Key: YARN-2619
 URL: https://issues.apache.org/jira/browse/YARN-2619
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2619-1.patch, YARN-2619.002.patch, 
 YARN-2619.003.patch, YARN-2619.004.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519840#comment-14519840
 ] 

Hadoop QA commented on YARN-3554:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   7m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 46s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729227/YARN-3554-20150429-2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7543/console |


This message was automatically generated.

 Default value for maximum nodemanager connect wait time is too high
 ---

 Key: YARN-3554
 URL: https://issues.apache.org/jira/browse/YARN-3554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R
 Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch


 The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
 msec or 15 minutes, which is way too high.  The default container expiry time 
 from the RM and the default task timeout in MapReduce are both only 10 
 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519858#comment-14519858
 ] 

stack commented on YARN-3411:
-

bq. So there are some major changes between hbase 0.98 and hbase like the 
client facing APIs (HTableInterface, etc) have been deprecated and replaced 
with new interfaces.

It would be a pity if you fellas were stuck on the 0.98 APIs. Phoenix is 
shaping up to do an RC that will work w/ hbase 1.x.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519861#comment-14519861
 ] 

Karthik Kambatla commented on YARN-3271:


Thanks for working on this, [~nijel]. 

While at this, can we improve how we initialize the scheduler in 
{{TestAppRunnability#setUp}} as below? 
{code}
Configuration conf = createConfiguration();
resourceManager = new MockRM(conf);
resourceManager.start();
scheduler = (FairScheduler) resourceManager.getResourceScheduler();
{code}

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-3561:

Environment: debian 7

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical
 Fix For: 2.6.1


 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
 transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - 

[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: (was: YARN-2893.004.patch)

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519960#comment-14519960
 ] 

Naganarasimha G R commented on YARN-3362:
-

Thanks [~wangda], for reviewing and testing the patch.

bq. partition=partition-name
Well i understand that in the later patches we are targetting it more as 
partition than labels, but in that case shall i modify the same in other 
locations of WEB like node labels page, in CS page shall i mark it as 
Accessible Partitions ?
bq. But it's good to keep avoid showing label when there's no label in your 
cluster.
you mean if no node is mapped to cluster node label then not to show that Node 
Label ?
bq. Showing partition of partition-specific queue metrics
you mean the existing names of metrics entries needs to be appended with 
(Partition=xxx) and not to show both right ?
bq. It seems multi hierarchy works well in my environment.
Its great to hear its working fine, but it worked without any modifications to 
the patch ? If so can you share offline your cluster setup (topology) with CS 
configuration, so that i can test it further.

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
 AM.png, YARN-3362.20150428-3.patch


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519792#comment-14519792
 ] 

Sangjin Lee commented on YARN-3044:
---

Yes that's kind of what I'm wondering about. So having them as events means 
that they should/will not be aggregated (e.g. from app = flow). Is that the 
intent with these values (CPU and cores)? I'm not exactly clear what these 
values indicate.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-3561:

Fix Version/s: 2.6.1

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
Reporter: Gour Saha
Priority: Critical
 Fix For: 2.6.1


 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
 transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 

[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-04-29 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3362:
-
Attachment: Screen Shot 2015-04-29 at 11.42.17 AM.png

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 
 AM.png, YARN-3362.20150428-3.patch


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519964#comment-14519964
 ] 

Xuan Gong commented on YARN-3544:
-

Original, we are calling getContainerReport to AMContainer information (such as 
container log url, nm address, startTime, etc). It works fine when the 
Application is running, and the container is running. But when the application 
is finished, we will not keep finished container info. In that case, we could 
not get any finished container report from RM. That is why we see the AM logs 
link in web ui as N/A as well as other related attempt information.

In this patch, instead of querying from container Report, we directly get 
attempt(AM Container) information from AttemptInfo which is from RMAttempt. So, 
no matter the application is running or is finished, we could get related 
information and show them in the web ui

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-29 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519911#comment-14519911
 ] 

Gour Saha commented on YARN-3561:
-

Slider stop command was called which initiates the Slider Storm application to 
stop (and hence the Slider AM to stop). 

Which property sets the keep-containers flag on?

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical
 Fix For: 2.6.1


 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 

[jira] [Created] (YARN-3562) unit tests fail with the failure to bring up node manager

2015-04-29 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3562:
-

 Summary: unit tests fail with the failure to bring up node manager
 Key: YARN-3562
 URL: https://issues.apache.org/jira/browse/YARN-3562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Minor


A bunch of MR unit tests are failing on our branch whenever the mini YARN 
cluster needs to bring up multiple node managers.

For example, see 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/

It is because the NMCollectorService is using a fixed port for the RPC (8048).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520226#comment-14520226
 ] 

Wangda Tan commented on YARN-3521:
--

Hi Sunil,
Thanks for working on this, some comments:

NodelabelsInfo: (It should be NodeLabelInfo, right?)
- nodeLabelName: don't need call {{new String()}} since it will be always 
initialized, and I prefer to call it name
- nodeLabelExclusivity - exclusivity
- Also getter
- Setters are not used by anybody, could be removed 
- I'm not sure if you need add an empty constructure to make {{// JAXB needs 
this}} like other infos?
- Could add a constructor of NodeLabelsInfo receives NodeLabel which will be 
used by RMWebServices
- We may need to add a separated NodeLabelsInfo and it contains ArrayList of 
NodeLabelInfo

NodeToLabelsInfo - NodeToLabelNames

addToClusterNodeLabels now receives Set as parameter, I'm not sure if it works, 
could you add test to verify add/get node labels? Now 
TestRMWebServicesNodeLabels will fail

 Support return structured NodeLabel objects in REST API when call 
 getClusterNodeLabels
 --

 Key: YARN-3521
 URL: https://issues.apache.org/jira/browse/YARN-3521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-3521.patch


 In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
 make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3563:
--
Attachment: Screen Shot 2015-04-29 at 2.11.19 PM.png

 Completed app shows -1 running containers on RM web UI
 --

 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Reporter: Zhijie Shen
 Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png


 See the attached screenshot. I saw this issue with trunk. Not sure if it 
 exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3563:
--
Component/s: webapp
 resourcemanager

 Completed app shows -1 running containers on RM web UI
 --

 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Reporter: Zhijie Shen
 Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png


 See the attached screenshot. I saw this issue with trunk. Not sure if it 
 exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3563:
-

 Summary: Completed app shows -1 running containers on RM web UI
 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


See the attached screenshot. I saw this issue with trunk. Not sure if it exists 
in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3551) Consolidate data model change according to the backend implementation

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520295#comment-14520295
 ] 

Sangjin Lee commented on YARN-3551:
---

I'm fine with going with using GenericOptionMapper for the 
serialization/deserialization of appropriate types. The generics is a 
suggestion for strengthening the types on the user side of things for the most 
part, so it may not be critical.

 Consolidate data model change according to the backend implementation
 -

 Key: YARN-3551
 URL: https://issues.apache.org/jira/browse/YARN-3551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3551.1.patch, YARN-3551.2.patch, YARN-3551.3.patch


 Based on the comments on 
 [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
  and 
 [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098],
  we need to change the data model to restrict the data type of 
 info/config/metric section.
 1. Info: the value could be all kinds object that is able to be 
 serialized/deserialized by jackson.
 2. Config: the value will always be assumed as String.
 3. Metric: single data or time series value have to be number for aggregation.
 Other than that, info/start time/finish time of metric seem not to be 
 necessary for storage. They should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1876) Document the REST APIs of timeline and generic history services

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1876.
---
Resolution: Duplicate

Duplicate is the right resolution.

 Document the REST APIs of timeline and generic history services
 ---

 Key: YARN-1876
 URL: https://issues.apache.org/jira/browse/YARN-1876
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentaion
 Attachments: YARN-1876.1.patch, YARN-1876.2.patch, YARN-1876.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520293#comment-14520293
 ] 

Jian He commented on YARN-3533:
---


patch looks good to me, thanks [~adhoot] ! 
hopefully this can resolve some intermittent failures we've seen recently.

 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-1876) Document the REST APIs of timeline and generic history services

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-1876:
---

 Document the REST APIs of timeline and generic history services
 ---

 Key: YARN-1876
 URL: https://issues.apache.org/jira/browse/YARN-1876
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentaion
 Attachments: YARN-1876.1.patch, YARN-1876.2.patch, YARN-1876.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520312#comment-14520312
 ] 

Vinod Kumar Vavilapalli commented on YARN-3539:
---

bq. So I'm not sure if it's good timeline now, as we foresee in the near 
future, we're going to be upgraded to ATS v2, which may significantly refurnish 
the APIs.
How about we simply say that people can continue to run the v1 Timeline Service 
(Single server backed by LevelDB) beyond Timeline Service next-gen? That way, 
older installations and apps can continue to use the old APIs, and the new APIs 
do not need to take the unknown burden of making the old APIs work on the newer 
framework.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)
Jian He created YARN-3564:
-

 Summary: 
TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
randomly 
 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520196#comment-14520196
 ] 

Vinod Kumar Vavilapalli commented on YARN-2868:
---

Going through old tickets. I have two questions
 # Why was this done in a scheduler specific way? RMAppAttempt clearly knows 
when it requests and when it gets the allocation.
 # Seems like the patch only looks at the first AM container. What happens if 
the we have a 2nd AM container?

I accidentally closed this ticket, so doesn't look like I can reopen it. If 
folks agree, I will open a new ticket.

 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1317:
--
Target Version/s: 2.8.0  (was: )

I'd like to at the least get some of this done in the 2.8 time-frame..

 Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
 ---

 Key: YARN-1317
 URL: https://issues.apache.org/jira/browse/YARN-1317
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Today, we are duplicating the exact same code in all the schedulers. Queue is 
 a top class concept - clientService, web-services etc already recognize queue 
 as a top level concept.
 We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520214#comment-14520214
 ] 

Sangjin Lee commented on YARN-3051:
---

{quote}
My major concern about this proposal is compatibility. Previously in v1, 
timeline entity is globally unique, such that when fetching a single entity 
before, users only need to provide entity type, entity id. app id, entity 
type, entity id is required to locate one entity, and theoretically null, 
entity type, entity id will refer to multiple entities. It probably makes 
difficult to be compatible to existing use cases.
{quote}

To hash out that point, existing use cases which previously assumed that entity 
id was globally unique would continue to generate entity id's that are globally 
unique, right? Since existing use cases (w/o modification) would stick to 
globally unique entity id's in practice, redefining the uniqueness requirement 
to be in the scope of application should not impact existing use cases. Entity 
id's that are generated to be unique globally would trivially be unique within 
the application scope. The point here is that since this is in the direction of 
relaxing uniqueness, stricter use cases (existing use cases) should not be 
impacted. Let me know your thoughts.

IMO, stating that the entity id's are unique within the scope of applications 
is not an invitation for frameworks to generate tons of redundant entity id's. 
Frameworks (MR, tez, ...) would likely continue to generate entity id's that 
are practically unique globally anyway. But the part of the timeline service, 
we don't have to have checks for enforcing global uniqueness.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520319#comment-14520319
 ] 

Vinod Kumar Vavilapalli commented on YARN-3539:
---

In a way, I am saying that there will be v1 end-points and v2 end-points. V1 
end-points go to the old Timeline Service and V2 end-points go to the next-gen 
Timeline Service.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520328#comment-14520328
 ] 

Vinod Kumar Vavilapalli commented on YARN-3477:
---

This looks good to me. [~zjshen], can you look and do the honors?

 TimelineClientImpl swallows exceptions
 --

 Key: YARN-3477
 URL: https://issues.apache.org/jira/browse/YARN-3477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0, 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-3477-001.patch, YARN-3477-002.patch


 If timeline client fails more than the retry count, the original exception is 
 not thrown. Instead some runtime exception is raised saying retries run out
 # the failing exception should be rethrown, ideally via 
 NetUtils.wrapException to include URL of the failing endpoing
 # Otherwise, the raised RTE should (a) state that URL and (b) set the 
 original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520339#comment-14520339
 ] 

Jian He commented on YARN-3533:
---

bq. getApplicationAttempt seems confusing, I just opened 
https://issues.apache.org/jira/browse/YARN-3546 to discuss this
I replied on the jira.

The TestContainerAllocation failure is unrelated to this patch. opening a new 
jira to fix that.

committing this.

 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520252#comment-14520252
 ] 

Thomas Graves commented on YARN-3517:
-

changes look good, +1.   thanks [~vvasudev]  

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Thomas Graves
Priority: Blocker
  Labels: security
 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520322#comment-14520322
 ] 

Sangjin Lee commented on YARN-3044:
---

{quote}
Some might be but many(findbugs and testcase) are not related to this jira, 
hence planning to raise seperate jira to handle the same.
And some findbugs (like Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent ) not 
planning to handle as its same as earlier code  if checks doesnt make sense 
here
{quote}
Understood. We should try to resolve the ones that make sense but don't have to 
be pedantic. By the way, note that I filed a separate JIRA for the unit test 
issues that already exist on YARN-2928 (YARN-3562).

{quote}
Well AFAIK it only affects readability here and had taken entry set iterator 
here as its generally preferred in terms of performance and concurrency (not 
relevance here). If you feel readability is a issue then can modify to simple 
loop 
{quote}
That's fine. It was a style nit (if that wasn't clear).

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-3517:
-

Assignee: Varun Vasudev  (was: Thomas Graves)

Seems like the JIRA assignee got mixed up, fixing..

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520332#comment-14520332
 ] 

Sangjin Lee commented on YARN-3045:
---

Hi [~Naganarasimha], I do have one quick question on the naming. I see a lot of 
names that include metrics, such as NMMetricsPublisher, NMMetricsEvent, 
NMMetricsEventType, and so on. And yet, they don't seem to involve metrics in 
the sense of timeline metrics. This is a source of confusion to me. Do we need 
metrics in these? They seem to be capturing purely lifecycle events. Could we 
change them to better names?

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3406) Display count of running containers in the RM's Web UI

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520273#comment-14520273
 ] 

Zhijie Shen commented on YARN-3406:
---

The web UI seems to have bug: YARN-3563

 Display count of running containers in the RM's Web UI
 --

 Key: YARN-3406
 URL: https://issues.apache.org/jira/browse/YARN-3406
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3406.1.patch, YARN-3406.2.patch, screenshot.png, 
 screenshot2.png


 Display the running containers in the all application list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520292#comment-14520292
 ] 

Jian He commented on YARN-3546:
---

[~sandflee], inside the scheduler, every application only has one attempt. so 
the current attempt is the attempt corresponding to the appAttemptId. So the 
name 'getAppAttempt(attemptId)' is matching with the internal implementation. 
If you agree, we can close this jira. 


 AbstractYarnScheduler.getApplicationAttempt seems misleading,  and there're 
 some misuse of it
 -

 Key: YARN-3546
 URL: https://issues.apache.org/jira/browse/YARN-3546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee

 I'm not familiar with scheduler,  with first eyes, I thought this func 
 returns the schdulerAppAttempt info corresponding to appAttemptId, but 
 actually it returns the current schdulerAppAttempt.
 It seems misled others too, such as
 TestWorkPreservingRMRestart.waitForNumContainersToRecover
 MockRM.waitForSchedulerAppAttemptAdded
 should I rename it to T getCurrentSchedulerApplicationAttempt(ApplicationId 
 applicationid)
 or returns null  if current attempt id not equals to the request attempt id ?
 comment preferred!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520317#comment-14520317
 ] 

Jason Lowe commented on YARN-3563:
--

This sounds closely related to, if not a duplicate of, YARN-3552.

 Completed app shows -1 running containers on RM web UI
 --

 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Reporter: Zhijie Shen
 Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png


 See the attached screenshot. I saw this issue with trunk. Not sure if it 
 exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520346#comment-14520346
 ] 

Jian He commented on YARN-3533:
---

committed to trunk and branch-2,  thanks Anubhav !
Thanks [~sandflee], [~rohithsharma] for the review !

 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3564:
--
Description: the test fails intermittently in jenkins 
https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/  (was: 
https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/)

 TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He

 the test fails intermittently in jenkins 
 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3564:
--
Description: 
https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/

 TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He

 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520355#comment-14520355
 ] 

Thomas Graves commented on YARN-3517:
-

thanks [~vinodkv] I missed that.

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520291#comment-14520291
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic] [~zjshen], just a quick thing to confirm that we want to use 
byte arrays for config and info fields in both of our storage. I'll convert the 
type for config and info in the Phoenix implementation to VARBINARY to be 
consistent with this design. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520308#comment-14520308
 ] 

Hudson commented on YARN-3517:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7701 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7701/])
YARN-3517. RM web ui for dumping scheduler logs should be for admins only 
(Varun Vasudev via tgraves) (tgraves: rev 
2e215484bd05cd5e3b7a81d3558c6879a05dd2d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* hadoop-yarn-project/CHANGES.txt


 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Thomas Graves
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520378#comment-14520378
 ] 

Hitesh Shah commented on YARN-3544:
---

Doesnt the NM log link redirect the log server after the logs have been 
aggregated? 

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520506#comment-14520506
 ] 

Hadoop QA commented on YARN-3564:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   5m 26s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 30s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  52m 15s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  73m 55s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729320/YARN-3564.1.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 4c1af15 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7545/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7545/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7545/console |


This message was automatically generated.

 TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3564.1.patch


 the test fails intermittently in jenkins 
 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3134:

Attachment: YARN-3134-YARN-2928.001.patch

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Report node resource utilization

2015-04-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520574#comment-14520574
 ] 

Karthik Kambatla commented on YARN-3534:


I notice the patch tries to provide utilization information in bytes for memory 
and float for CPU. Since the RM schedules in MB and vcores, seeing the 
utilization as rounded-up values in a Resource object is probably enough. 

 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520365#comment-14520365
 ] 

Zhijie Shen commented on YARN-3544:
---

Xuan, thanks for the patch. I've tried your patch locally, and it brought the 
content back to the web UI. However, I've one concern. It seems that the link 
to the local log on NM is not useful after the app is finished, because the log 
is not supposed to be there any longer. So is this jira supposed to fix the 
regression, or ultimately provide a useful link to AM container log? Those seem 
to be different goals.

/cc [~hitesh]

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-04-29 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3565:


 Summary: NodeHeartbeatRequest/RegisterNodeManagerRequest should 
use NodeLabel object instead of String
 Key: YARN-3565
 URL: https://issues.apache.org/jira/browse/YARN-3565
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker


Now NM HB/Register uses SetString, it will be hard to add new fields if we 
want to support specifying NodeLabel type such as exclusivity/constraints, etc. 
We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3473) Fix RM Web UI configuration for some properties

2015-04-29 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3473:
-
Labels: BB2015-05-TBR  (was: )

 Fix RM Web UI configuration for some properties
 ---

 Key: YARN-3473
 URL: https://issues.apache.org/jira/browse/YARN-3473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: YARN-3473.001.patch


 Using the RM Web UI, the Tools-Configuration page shows some properties as 
 something like BufferedInputStream instead of the appropriate .xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520473#comment-14520473
 ] 

Zhijie Shen commented on YARN-3539:
---

bq. That way, older installations and apps can continue to use the old APIs, 
and the new APIs do not need to take the unknown burden of making the old APIs 
work on the newer framework.

This sounds a more reasonable commitment for ATS v2.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) FairScheduler: Metric for latency to allocate first container for an application

2015-04-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520480#comment-14520480
 ] 

Ray Chiang commented on YARN-2868:
--

I'll answer these in reverse order:

2) The first AM container is the easy one to measure.  Subsequent 
measurements can be tricky since the request time will need to be recorded 
somewhere until the request is actually fulfilled.  Tracking all the requests 
and corresponding fulfillments would be a lot more work and may want more 
sophisticated measurements.  I haven't filed a JIRA for doing the later 
containers.

1) Breaking this answer into several parts.  I'm not going to remember all the 
iterations I went through but I'll answer as best as I can.

1A) YARN-3105 covers the enhancements to StateMachine to record state 
transitions generically for metrics.  [~jianhe] made the original suggestion.

1B) There were several factors for this.  I think it was a combination of 
wanting queue-specific metrics, wanting to separate first allocation from later 
allocations, working with managed and unmanaged AMs, and a desire to get a more 
exact measurement with less overhead.  I've deleted all my earliest attempts at 
this (i.e. those prior to the first patch on this JIRA), so I can't provide 
more specific information offhand.

Let me know if that satisfactorily answers your questions.

 FairScheduler: Metric for latency to allocate first container for an 
 application
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Fix For: 2.8.0

 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
 YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
 YARN-2868.012.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml

2015-04-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520497#comment-14520497
 ] 

Ray Chiang commented on YARN-3069:
--

Thanks Akira!  I've made those changes.  I definitely left some empty 
descriptions in yarn-default.xml where I couldn't figure out what the property 
was for.

I'll wait for more of your review before uploading a new patch.

 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520520#comment-14520520
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729341/YARN-3134-YARN-2928.runJenkins.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4c1af15 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7546/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.runJenkins.001.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3134:

Attachment: (was: YARN-3134-YARN-2928.runJenkins.001.patch)

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3482) Report NM resource capacity in heartbeat

2015-04-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3482:
---
Summary: Report NM resource capacity in heartbeat  (was: Report NM 
available resources in heartbeat)

 Report NM resource capacity in heartbeat
 

 Key: YARN-3482
 URL: https://issues.apache.org/jira/browse/YARN-3482
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
   Original Estimate: 504h
  Remaining Estimate: 504h

 NMs are usually collocated with other processes like HDFS, Impala or HBase. 
 To manage this scenario correctly, YARN should be aware of the actual 
 available resources. The proposal is to have an interface to dynamically 
 change the available resources and report this to the RM in every heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520544#comment-14520544
 ] 

Wangda Tan commented on YARN-3564:
--

+1, will commit later.

 TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3564.1.patch


 the test fails intermittently in jenkins 
 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520380#comment-14520380
 ] 

Hudson commented on YARN-3533:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7702 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7702/])
YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. 
Contributed by Anubhav Dhoot (jianhe: rev 
4c1af156aef4f3bb1d9823d5980c59b12007dc77)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520379#comment-14520379
 ] 

Hitesh Shah commented on YARN-3544:
---

I meant redirect to the log server 

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520396#comment-14520396
 ] 

Vinod Kumar Vavilapalli commented on YARN-3445:
---

There is a too much of duplicate information already in NodeHeartbeatRequest, 
albeit for slightly different purposes. We need to consolidate the following 
(without breaking compatibility of previous releases), lest the heartbeat will 
become heavier and heavier.
 - logAggregationReportsForApps added, but not released yet
-- logAggregationReportsForApps itself is a map of ApplicationID with a 
nested LogAggregationReport.ApplicationID - duplicate AppID information
 - runningApplications in this patch
 - NodeStatus.keepAliveApplications

/cc [~jianhe] [~leftnoteasy]

 Cache runningApps in RMNode for getting running apps on given NodeId
 

 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3445-v2.patch, YARN-3445.patch


 Per discussion in YARN-3334, we need filter out unnecessary collectors info 
 from RM in heartbeat response. Our propose is to add cache for runningApps in 
 RMNode, so RM only send collectors for local running apps back. This is also 
 needed in YARN-914 (graceful decommission) that if no running apps in NM 
 which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3134:

Attachment: YARN-3134-YARN-2928.runJenkins.001.patch

In the latest patch I addressed all previous comments, and changed the storage 
type for config and info into byte arrays. I've also revised the storage of 
metrics, which no longer uses startTime and end Time. Right now I'm focusing on 
storing singleData since we need to discuss more about storing and aggregating 
time series data. 

Renaming the patch to the new format so that we can try jenkins on YARN-2928 
branch. Disable the Phoenix test for now since it's blocked by YARN-3529. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.runJenkins.001.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Report node resource utilization

2015-04-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520563#comment-14520563
 ] 

Karthik Kambatla commented on YARN-3534:


Skimmed through the latest patch. High-level comments/questions:
# Do we need a separate class/proto for ResourceUtilization? Could we just 
reuse Resource? That should make the patch significantly small. 
# Would be nice to have NodeResourceMonitor emit metrics for usage. We could do 
this on a follow-up JIRA.



 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Report node resource utilization

2015-04-29 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520584#comment-14520584
 ] 

Inigo Goiri commented on YARN-3534:
---

The original reason for having ResourceUtilization was getting better 
granularity for the CPUs. We had some discussion about it in YARN-3481; take a 
look there and chim in.

My original implementation had Resource but when trying to do scheduling of 
containers based on this, there were a lot of holes in the scheduling. Given 
this, I thought this patch was a good place to create this new utilization 
entity with CPU as a float.

Regarding the metrics in the NodeResourceMonitor, I completely agree. I thought 
about doing it right away but as you mentioned, it seemed a better idea to save 
it for another JIRA. Let's do that.

 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520586#comment-14520586
 ] 

Zhijie Shen commented on YARN-3544:
---

[~xgong], the patch doesn't apply for branch-2.7. It seems to be non-trivial 
conflict merge. Would you please take a look?

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520599#comment-14520599
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 47s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:red}-1{color} | javac |   7m 58s | The applied patch generated  8  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   4m  5s | The applied patch generated  2 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 41s | The patch appears to introduce 
10 new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 18s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-timelineservice |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntity, 
TimelineWriteResponse):in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntity, TimelineWriteResponse): 
new java.io.FileWriter(String, boolean)  At 
FileSystemTimelineWriterImpl.java:[line 86] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.tryInitTable()
 may fail to clean up java.sql.Statement on checked exception  Obligation to 
clean up resource created at PhoenixTimelineWriterImpl.java:up 
java.sql.Statement on checked exception  Obligation to clean up resource 
created at PhoenixTimelineWriterImpl.java:[line 227] is not discharged |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.executeQuery(String)
 may fail to close Statement  At PhoenixTimelineWriterImpl.java:Statement  At 
PhoenixTimelineWriterImpl.java:[line 492] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEntityVariableLengthFields(TimelineEntity,
 TimelineCollectorContext, Connection)   At PhoenixTimelineWriterImpl.java:from 
a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEntityVariableLengthFields(TimelineEntity,
 TimelineCollectorContext, Connection)   At 
PhoenixTimelineWriterImpl.java:[line 389] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEvents(TimelineEntity,
 TimelineCollectorContext, Connection)   At PhoenixTimelineWriterImpl.java:from 
a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEvents(TimelineEntity,
 TimelineCollectorContext, Connection)   At 
PhoenixTimelineWriterImpl.java:[line 476] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeMetrics(TimelineEntity,
 TimelineCollectorContext, Connection)   At PhoenixTimelineWriterImpl.java:from 
a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeMetrics(TimelineEntity,
 TimelineCollectorContext, Connection)   At 
PhoenixTimelineWriterImpl.java:[line 433] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntities)   At 
PhoenixTimelineWriterImpl.java:from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntities)   At 
PhoenixTimelineWriterImpl.java:[line 167] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.setBytesForColumnFamily(PreparedStatement,
 Map, int) makes inefficient use of keySet iterator instead of entrySet 

[jira] [Updated] (YARN-3564) TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3564:
--
Attachment: YARN-3564.1.patch

patch to fix the failure

 TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3564.1.patch


 the test fails intermittently in jenkins 
 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3392) Change NodeManager metrics to not populate resource usage metrics if they are unavailable

2015-04-29 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot resolved YARN-3392.
-
Resolution: Duplicate

 Change NodeManager metrics to not populate resource usage metrics if they are 
 unavailable 
 --

 Key: YARN-3392
 URL: https://issues.apache.org/jira/browse/YARN-3392
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3392.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520425#comment-14520425
 ] 

Zhijie Shen commented on YARN-3544:
---

bq. Doesnt the NM log link redirect the log server after the logs have been 
aggregated?

Thanks, Hitesh! I didn't notice this option before. Tried it locally, and the 
whole process of the completed log is working fine now.

Will commit the patch late today unless there's further comment.

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >