[jira] [Created] (YARN-3175) Consolidate the ResournceManager documentation into one

2015-02-11 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-3175:
--

 Summary: Consolidate the ResournceManager documentation into one
 Key: YARN-3175
 URL: https://issues.apache.org/jira/browse/YARN-3175
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer


We really don't need a different document for every individual RM feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316609#comment-14316609
 ] 

Sangjin Lee commented on YARN-2423:
---

I'm not sure how realistic it will be not to impact callers of the timeline 
client between the current ATS and the next gen if that's what you're asking. 
As such, work we may do on this JIRA would not have a direct implication on 
what happens on the next gen.

How important/critical is it to support this in the 2.7 timeframe? I think the 
current consensus is not to work on major feature additions on ATS. But I'm not 
sure how major this would be.

Separately, it would be desirable to support Java APIs for reads in the next 
gen too.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label

2015-02-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3124:
-
Attachment: YARN-3124.4.patch

Fixed findbugs warning, attached ver.4 patch

 Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track 
 capacities-by-label
 

 Key: YARN-3124
 URL: https://issues.apache.org/jira/browse/YARN-3124
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, 
 YARN-3124.4.patch


 After YARN-3098, capacities-by-label (include 
 used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be 
 tracked in QueueCapacities.
 This patch is targeting to make capacities-by-label in CS Queues are all 
 tracked by QueueCapacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3041) create the ATS entity/event API

2015-02-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316597#comment-14316597
 ] 

Sangjin Lee commented on YARN-3041:
---

Thanks for the clarification Robert.

 create the ATS entity/event API
 ---

 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: YARN-3041.preliminary.001.patch


 Per design in YARN-2928, create the ATS entity and events API.
 Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
 flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316353#comment-14316353
 ] 

Hudson commented on YARN-3160:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/])
YARN-3160. Fix non-atomic operation on nodeUpdateQueue in RMNodeImpl. 
(Contributed by Chengbing Liu) (junping_du: rev 
c541a374d88ffed6ee71b0e5d556939ccd2c5159)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


 Non-atomic operation on nodeUpdateQueue in RMNodeImpl
 -

 Key: YARN-3160
 URL: https://issues.apache.org/jira/browse/YARN-3160
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.7.0

 Attachments: YARN-3160.2.patch, YARN-3160.patch


 {code:title=RMNodeImpl.java|borderStyle=solid}
 while(nodeUpdateQueue.peek() != null){
   latestContainerInfoList.add(nodeUpdateQueue.poll());
 }
 {code}
 The above code brings potential risk of adding null value to 
 {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a 
 wait-free algorithm, we can directly poll the queue, before checking whether 
 the value is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316352#comment-14316352
 ] 

Hudson commented on YARN-2246:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/])
YARN-2246. Made the proxy tracking URL always be http(s)://proxy 
addr:port/proxy/appId to avoid duplicate sections. Contributed by Devaraj K. 
(zjshen: rev d5855c0e46404cfc1b5a63e59015e68ba668f0ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt


 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.7.0

 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
 YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316360#comment-14316360
 ] 

Hudson commented on YARN-2809:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/])
YARN-2809. Implement workaround for linux kernel panic when removing cgroup. 
Contributed by Nathan Roberts (jlowe: rev 
3f5431a22fcef7e3eb9aceeefe324e5b7ac84049)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* hadoop-yarn-project/CHANGES.txt


 Implement workaround for linux kernel panic when removing cgroup
 

 Key: YARN-2809
 URL: https://issues.apache.org/jira/browse/YARN-2809
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment:  RHEL 6.4
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Fix For: 2.7.0

 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch


 Some older versions of linux have a bug that can cause a kernel panic when 
 the LCE attempts to remove a cgroup. It is a race condition so it's a bit 
 rare but on a few thousand node cluster it can result in a couple of panics 
 per day.
 This is the commit that likely (haven't verified) fixes the problem in linux: 
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
 Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316357#comment-14316357
 ] 

Hudson commented on YARN-3090:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/])
YARN-3090. DeletionService can silently ignore deletion task failures. 
Contributed by Varun Saxena (jlowe: rev 
4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* hadoop-yarn-project/CHANGES.txt


 DeletionService can silently ignore deletion task failures
 --

 Key: YARN-3090
 URL: https://issues.apache.org/jira/browse/YARN-3090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, 
 YARN-3090.003.patch, YARN-3090.04.patch


 If a non-I/O exception occurs while the DeletionService is executing a 
 deletion task then it will be silently ignored.  The exception bubbles up to 
 the thread workers of the ScheduledThreadPoolExecutor which simply attaches 
 the throwable to the Future that was returned when the task was scheduled.  
 However the thread pool is used as a fire-and-forget pool, so nothing ever 
 looks at the Future and therefore the exception is never logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3173) start-yarn.sh script can't aware how many RMs to be started.

2015-02-11 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3173:
---
Component/s: scripts

 start-yarn.sh script  can't aware how many RMs to be started.
 -

 Key: YARN-3173
 URL: https://issues.apache.org/jira/browse/YARN-3173
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: BOB
Priority: Minor

 When config more than one RM, for example in the HA cluster, using the 
 start-yarn.sh script to start yarn cluster,but the cluster only start up with 
 one resourcemanager  on the node which start-yarn.sh be executed. I think 
 yarn should sense how many RMs been configured at the beginning, and start 
 them all in start-yarn.sh script. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2015-02-11 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316587#comment-14316587
 ] 

Chris Trezzo commented on YARN-2654:


I am willing to change the parameters to .http.address, but it seems as though 
a large number of parameters in the YARN code have chosen to use 
webapp.address. I will stay consistent with this, unless there is strong 
opposition. Any other comments on config parameter naming? Otherwise I will 
close this as resolved. [~vinodkv] [~sjlee0] [~kasha]

 Revisit all shared cache config parameters to ensure quality names
 --

 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Blocker
 Attachments: shared_cache_config_parameters.txt


 Revisit all the shared cache config parameters in YarnConfiguration and 
 yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316505#comment-14316505
 ] 

Hudson commented on YARN-3074:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7073 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7073/])
YARN-3074. Nodemanager dies when localizer runner tries to write to a full 
disk. Contributed by Varun Saxena (jlowe: rev 
b379972ab39551d4b57436a54c0098a63742c7e1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Nodemanager dies when localizer runner tries to write to a full disk
 

 Key: YARN-3074
 URL: https://issues.apache.org/jira/browse/YARN-3074
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3074.001.patch, YARN-3074.002.patch, 
 YARN-3074.03.patch


 When a LocalizerRunner tries to write to a full disk it can bring down the 
 nodemanager process.  Instead of failing the whole process we should fail 
 only the container and make a best attempt to keep going.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk

2015-02-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316449#comment-14316449
 ] 

Jason Lowe commented on YARN-3074:
--

+1 lgtm as well.  Committing this.

 Nodemanager dies when localizer runner tries to write to a full disk
 

 Key: YARN-3074
 URL: https://issues.apache.org/jira/browse/YARN-3074
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Attachments: YARN-3074.001.patch, YARN-3074.002.patch, 
 YARN-3074.03.patch


 When a LocalizerRunner tries to write to a full disk it can bring down the 
 nodemanager process.  Instead of failing the whole process we should fail 
 only the container and make a best attempt to keep going.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3174) Consolidate the NodeManager documentation into one

2015-02-11 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-3174:
--

 Summary: Consolidate the NodeManager documentation into one
 Key: YARN-3174
 URL: https://issues.apache.org/jira/browse/YARN-3174
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer


We really don't need a different document for every individual nodemanager 
feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2015-02-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316606#comment-14316606
 ] 

Junping Du commented on YARN-914:
-

bq. I do agree with Vinod that there should minimally be an easy way, CLI or 
otherwise, for outside scripts driving the decommission to either force it or 
wait for it to complete. If waiting, there also needs to be a way to either 
have the wait have a timeout which will force after that point or another 
method with which to easily kill the containers still on that node.
Make sense. Sounds like most of us here make agreement on to go with 2nd 
approach proposed by Ming and refined by Vinod.

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du
 Attachments: Gracefully Decommission of NodeManager (v1).pdf


 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-02-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316562#comment-14316562
 ] 

Brahma Reddy Battula commented on YARN-3170:


Hello [~aw]

 *I want to update like following* 

{quote}
Apache Hadoop NextGen MapReduce (YARN)
{quote}

Apache Hadoop Yarn

{quote}
MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, 
what we call, MapReduce 2.0 (MRv2) or YARN.
{quote}

will remove this line

{quote}
The fundamental idea of MRv2
{quote}

will update YARN instead of MRv2.

 *{color:blue}Please give your inputs{color}* 

 YARN architecture document needs updating
 -

 Key: YARN-3170
 URL: https://issues.apache.org/jira/browse/YARN-3170
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula

 The marketing paragraph at the top, NextGen MapReduce, etc are all 
 marketing rather than actual descriptions. It also needs some general 
 updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3169) drop the useless yarn overview document

2015-02-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316566#comment-14316566
 ] 

Brahma Reddy Battula commented on YARN-3169:


Hi [~aw]

sorry. this document I did not find in trunk.. can you please tell the 
location..?  

Thanks..

 drop the useless yarn overview document
 ---

 Key: YARN-3169
 URL: https://issues.apache.org/jira/browse/YARN-3169
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula

 It's pretty superfluous given there is a site index on the left.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3177) Fix the order of the parameters in YarnConfiguration

2015-02-11 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3177:
---
Attachment: YARN-3177.patch

 Fix the order of the parameters in YarnConfiguration
 

 Key: YARN-3177
 URL: https://issues.apache.org/jira/browse/YARN-3177
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3177.patch


  *1. keep Process principal and keytab one place..( NM and RM are not placed 
 in order)* 
 {code} 
 public static final String RM_AM_MAX_ATTEMPTS =
 RM_PREFIX + am.max-attempts;
   public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2;
   
   /** The keytab for the resource manager.*/
   public static final String RM_KEYTAB = 
 RM_PREFIX + keytab;
   /**The kerberos principal to be used for spnego filter for RM.*/
   public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY =
   RM_PREFIX + webapp.spnego-principal;
   
   /**The kerberos keytab to be used for spnego filter for RM.*/
   public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY =
   RM_PREFIX + webapp.spnego-keytab-file;
 {code}
  *2.RM  webapp adress and port are not in order* 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name

2015-02-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316741#comment-14316741
 ] 

Xuan Gong commented on YARN-3164:
-

bq. Have tested the same manually as of now and is working fine .

Thanks for testing this manually. Could you add the unit test for this ? Maybe 
add a unit test in TestRMAdminCLI ?

 rmadmin command usage prints incorrect command name
 ---

 Key: YARN-3164
 URL: https://issues.apache.org/jira/browse/YARN-3164
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: YARN-3164.1.patch


 /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color}
 transitionToActive: incorrect number of arguments
 Usage:{color:red}  HAAdmin  {color} [-transitionToActive serviceId 
 [--forceactive]]
 {color:red} ./yarn HAAdmin {color} 
 Error: Could not find or load main class HAAdmin
 Expected it should be rmadmin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3177) Fix the order of the parameters in YarnConfiguration

2015-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316749#comment-14316749
 ] 

Hadoop QA commented on YARN-3177:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698143/YARN-3177.patch
  against trunk revision e42fc1a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6597//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6597//console

This message is automatically generated.

 Fix the order of the parameters in YarnConfiguration
 

 Key: YARN-3177
 URL: https://issues.apache.org/jira/browse/YARN-3177
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3177.patch


  *1. keep Process principal and keytab one place..( NM and RM are not placed 
 in order)* 
 {code} 
 public static final String RM_AM_MAX_ATTEMPTS =
 RM_PREFIX + am.max-attempts;
   public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2;
   
   /** The keytab for the resource manager.*/
   public static final String RM_KEYTAB = 
 RM_PREFIX + keytab;
   /**The kerberos principal to be used for spnego filter for RM.*/
   public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY =
   RM_PREFIX + webapp.spnego-principal;
   
   /**The kerberos keytab to be used for spnego filter for RM.*/
   public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY =
   RM_PREFIX + webapp.spnego-keytab-file;
 {code}
  *2.RM  webapp adress and port are not in order* 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application

2015-02-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316766#comment-14316766
 ] 

Xuan Gong commented on YARN-3151:
-

bq. -  diagnostics.contains(applicationAttempt.getWebProxyBase())); 
   
+diagnostics.contains(applicationAttempt.getTrackingUrl()));

Any reason why we need to change the test code ?

 On Failover tracking url wrong in application cli for KILLED application
 

 Key: YARN-3151
 URL: https://issues.apache.org/jira/browse/YARN-3151
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Affects Versions: 2.6.0
 Environment: 2 RM HA 
Reporter: Bibin A Chundatt
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3151.patch, 0002-YARN-3151.patch, 
 0002-YARN-3151.patch


 Run an application and kill the same after starting
 Check {color:red} ./yarn application -list -appStates KILLED {color}
 (empty line)
 {quote}
 Application-Id Tracking-URL
 application_1423219262738_0001  
 http://IP:PORT/cluster/app/application_1423219262738_0001
 {quote}
 Shutdown the active RM1
 Check the same command {color:red} ./yarn application -list -appStates KILLED 
 {color} after RM2 is active
 {quote}
 Application-Id Tracking-URL
 application_1423219262738_0001  null
 {quote}
 Tracking url for application is shown as null 
 Expected : Same url before failover should be shown
 ApplicationReport .getOriginalTrackingUrl() is null after failover
 org.apache.hadoop.yarn.client.cli.ApplicationCLI
 listApplications(SetString appTypes,
   EnumSetYarnApplicationState appStates)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label

2015-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316637#comment-14316637
 ] 

Hadoop QA commented on YARN-3124:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698119/YARN-3124.4.patch
  against trunk revision b94c111.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6595//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6595//console

This message is automatically generated.

 Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track 
 capacities-by-label
 

 Key: YARN-3124
 URL: https://issues.apache.org/jira/browse/YARN-3124
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, 
 YARN-3124.4.patch


 After YARN-3098, capacities-by-label (include 
 used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be 
 tracked in QueueCapacities.
 This patch is targeting to make capacities-by-label in CS Queues are all 
 tracked by QueueCapacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) implement RM writing app lifecycle events to ATS

2015-02-11 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316636#comment-14316636
 ] 

Devaraj K commented on YARN-3044:
-

[~Naganarasimha], would you mind if I take it up, if you haven't started 
working on this already?

 implement RM writing app lifecycle events to ATS
 

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) implement RM writing app lifecycle events to ATS

2015-02-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316848#comment-14316848
 ] 

Naganarasimha G R commented on YARN-3044:
-

Hi [~devaraj.k], Thanks for showing interest in this jira, but this jira is 
work in continuation with 3034 and i would like to work on them .

 implement RM writing app lifecycle events to ATS
 

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-914) Support graceful decommission of nodemanager

2015-02-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-914:

Attachment: Gracefully Decommission of NodeManager (v2).pdf

Update proposal to reflect what we discussed above. 
Some key updates:
- Change the whole architecture to keep Decommission_In_Progress dark from NM 
side but only within RM side.
- Move tracking of timeout out of core of YARN to new CLI
- Keep track on persistent of RMNode state (with tracking with YARN-2567)
- Remove new configurations of enable and timeout, as both seems 
unnecessary for now
- Break down work items

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du
 Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
 Gracefully Decommission of NodeManager (v2).pdf


 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent

2015-02-11 Thread Siqi Li (JIRA)
Siqi Li created YARN-3176:
-

 Summary: In Fair Scheduler, child queue should inherit maxApp from 
its parent
 Key: YARN-3176
 URL: https://issues.apache.org/jira/browse/YARN-3176
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent

2015-02-11 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned YARN-3176:
-

Assignee: Siqi Li

 In Fair Scheduler, child queue should inherit maxApp from its parent
 

 Key: YARN-3176
 URL: https://issues.apache.org/jira/browse/YARN-3176
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3177) Fix the order of the parameters in YarnConfiguration

2015-02-11 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created YARN-3177:
--

 Summary: Fix the order of the parameters in YarnConfiguration
 Key: YARN-3177
 URL: https://issues.apache.org/jira/browse/YARN-3177
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Priority: Minor


 *1. keep Process principal and keytab one place..( NM and RM are not placed in 
order)* 

{code} 
public static final String RM_AM_MAX_ATTEMPTS =
RM_PREFIX + am.max-attempts;
  public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2;
  
  /** The keytab for the resource manager.*/
  public static final String RM_KEYTAB = 
RM_PREFIX + keytab;

  /**The kerberos principal to be used for spnego filter for RM.*/
  public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY =
  RM_PREFIX + webapp.spnego-principal;
  
  /**The kerberos keytab to be used for spnego filter for RM.*/
  public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY =
  RM_PREFIX + webapp.spnego-keytab-file;
{code}

 *2.RM  webapp adress and port are not in order* 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent

2015-02-11 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3176:
--
Description: if the child queue does not have a maxRunningApp limit, it 
will use the queueMaxAppsDefault. This behavior is not quite right, since 
queueMaxAppsDefault is normally a small number, whereas some parent queues do 
have maxRunningApp set to be more than the default

 In Fair Scheduler, child queue should inherit maxApp from its parent
 

 Key: YARN-3176
 URL: https://issues.apache.org/jira/browse/YARN-3176
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3176.v1.patch


 if the child queue does not have a maxRunningApp limit, it will use the 
 queueMaxAppsDefault. This behavior is not quite right, since 
 queueMaxAppsDefault is normally a small number, whereas some parent queues do 
 have maxRunningApp set to be more than the default



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent

2015-02-11 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3176:
--
Attachment: YARN-3176.v1.patch

 In Fair Scheduler, child queue should inherit maxApp from its parent
 

 Key: YARN-3176
 URL: https://issues.apache.org/jira/browse/YARN-3176
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3176.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent

2015-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316797#comment-14316797
 ] 

Hadoop QA commented on YARN-3176:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698137/YARN-3176.v1.patch
  against trunk revision 22441ab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6596//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6596//console

This message is automatically generated.

 In Fair Scheduler, child queue should inherit maxApp from its parent
 

 Key: YARN-3176
 URL: https://issues.apache.org/jira/browse/YARN-3176
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3176.v1.patch


 if the child queue does not have a maxRunningApp limit, it will use the 
 queueMaxAppsDefault. This behavior is not quite right, since 
 queueMaxAppsDefault is normally a small number, whereas some parent queues do 
 have maxRunningApp set to be more than the default



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name

2015-02-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316851#comment-14316851
 ] 

Allen Wittenauer commented on YARN-3164:


Adding a unit test for this is a waste of time, IMO. 

I'm much more curious why we are override the method rather than just changing 
the text directly.  Does anything else actually even use the method that's 
being overridden?  

(In general, having this method even exist seems like a strong case of 
over-engineering, which is pretty prevalent throughout Hadoop.)

 rmadmin command usage prints incorrect command name
 ---

 Key: YARN-3164
 URL: https://issues.apache.org/jira/browse/YARN-3164
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: YARN-3164.1.patch


 /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color}
 transitionToActive: incorrect number of arguments
 Usage:{color:red}  HAAdmin  {color} [-transitionToActive serviceId 
 [--forceactive]]
 {color:red} ./yarn HAAdmin {color} 
 Error: Could not find or load main class HAAdmin
 Expected it should be rmadmin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-02-11 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316414#comment-14316414
 ] 

Varun Saxena commented on YARN-2902:


Yes...Sorry but have been keeping busy since last couple of weeks. Will update 
by this weekend.

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2902.002.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Umbrella: Add a way to register long-lived services in a YARN cluster

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316261#comment-14316261
 ] 

Hudson commented on YARN-913:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7070 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7070/])
YARN-2683. [YARN-913] registry config options: document and move to 
core-default. (stevel) (stevel: rev c3da2db48fd18c41096fe5d6d4650978fb31ae24)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/index.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/using-the-yarn-service-registry.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-security.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Umbrella: Add a way to register long-lived services in a YARN cluster
 -

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316274#comment-14316274
 ] 

Hudson commented on YARN-3090:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/])
YARN-3090. DeletionService can silently ignore deletion task failures. 
Contributed by Varun Saxena (jlowe: rev 
4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* hadoop-yarn-project/CHANGES.txt


 DeletionService can silently ignore deletion task failures
 --

 Key: YARN-3090
 URL: https://issues.apache.org/jira/browse/YARN-3090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, 
 YARN-3090.003.patch, YARN-3090.04.patch


 If a non-I/O exception occurs while the DeletionService is executing a 
 deletion task then it will be silently ignored.  The exception bubbles up to 
 the thread workers of the ScheduledThreadPoolExecutor which simply attaches 
 the throwable to the Future that was returned when the task was scheduled.  
 However the thread pool is used as a fire-and-forget pool, so nothing ever 
 looks at the Future and therefore the exception is never logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316269#comment-14316269
 ] 

Hudson commented on YARN-2246:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/])
YARN-2246. Made the proxy tracking URL always be http(s)://proxy 
addr:port/proxy/appId to avoid duplicate sections. Contributed by Devaraj K. 
(zjshen: rev d5855c0e46404cfc1b5a63e59015e68ba668f0ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.7.0

 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
 YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316277#comment-14316277
 ] 

Hudson commented on YARN-2809:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/])
YARN-2809. Implement workaround for linux kernel panic when removing cgroup. 
Contributed by Nathan Roberts (jlowe: rev 
3f5431a22fcef7e3eb9aceeefe324e5b7ac84049)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java


 Implement workaround for linux kernel panic when removing cgroup
 

 Key: YARN-2809
 URL: https://issues.apache.org/jira/browse/YARN-2809
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment:  RHEL 6.4
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Fix For: 2.7.0

 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch


 Some older versions of linux have a bug that can cause a kernel panic when 
 the LCE attempts to remove a cgroup. It is a race condition so it's a bit 
 rare but on a few thousand node cluster it can result in a couple of panics 
 per day.
 This is the commit that likely (haven't verified) fixes the problem in linux: 
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
 Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Umbrella: Add a way to register long-lived services in a YARN cluster

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316253#comment-14316253
 ] 

Hudson commented on YARN-913:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7069 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7069/])
YARN-2616 [YARN-913] Add CLI client to the registry to list, view and 
manipulate entries. (Akshay Radia via stevel) (stevel: rev 
362565cf5a8cbc1e7e66847649c29666d79f6938)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/cli/TestRegistryCli.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/cli/RegistryCli.java
* hadoop-yarn-project/CHANGES.txt


 Umbrella: Add a way to register long-lived services in a YARN cluster
 -

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list, view and manipulate entries

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316252#comment-14316252
 ] 

Hudson commented on YARN-2616:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7069 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7069/])
YARN-2616 [YARN-913] Add CLI client to the registry to list, view and 
manipulate entries. (Akshay Radia via stevel) (stevel: rev 
362565cf5a8cbc1e7e66847649c29666d79f6938)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/cli/TestRegistryCli.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/cli/RegistryCli.java
* hadoop-yarn-project/CHANGES.txt


 Add CLI client to the registry to list, view and manipulate entries
 ---

 Key: YARN-2616
 URL: https://issues.apache.org/jira/browse/YARN-2616
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Akshay Radia
 Fix For: 2.7.0

 Attachments: YARN-2616-003.patch, YARN-2616-008.patch, 
 YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, 
 yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch


 registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316270#comment-14316270
 ] 

Hudson commented on YARN-3160:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/])
YARN-3160. Fix non-atomic operation on nodeUpdateQueue in RMNodeImpl. 
(Contributed by Chengbing Liu) (junping_du: rev 
c541a374d88ffed6ee71b0e5d556939ccd2c5159)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java


 Non-atomic operation on nodeUpdateQueue in RMNodeImpl
 -

 Key: YARN-3160
 URL: https://issues.apache.org/jira/browse/YARN-3160
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.7.0

 Attachments: YARN-3160.2.patch, YARN-3160.patch


 {code:title=RMNodeImpl.java|borderStyle=solid}
 while(nodeUpdateQueue.peek() != null){
   latestContainerInfoList.add(nodeUpdateQueue.poll());
 }
 {code}
 The above code brings potential risk of adding null value to 
 {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a 
 wait-free algorithm, we can directly poll the queue, before checking whether 
 the value is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries

2015-02-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316232#comment-14316232
 ] 

Steve Loughran commented on YARN-2616:
--

+1
applying to branch-2+

 Add CLI client to the registry to list/view entries
 ---

 Key: YARN-2616
 URL: https://issues.apache.org/jira/browse/YARN-2616
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Akshay Radia
 Attachments: YARN-2616-003.patch, YARN-2616-008.patch, 
 YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, 
 yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch


 registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3173) start-yarn.sh script can't aware how many RMs to be started.

2015-02-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316372#comment-14316372
 ] 

Allen Wittenauer commented on YARN-3173:


This actually can be fixed if start-yarn uses some of the same tricks that 
start-dfs uses.  However, it's a lot easier with HADOOP-11565 committed.

 start-yarn.sh script  can't aware how many RMs to be started.
 -

 Key: YARN-3173
 URL: https://issues.apache.org/jira/browse/YARN-3173
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: BOB
Priority: Minor

 When config more than one RM, for example in the HA cluster, using the 
 start-yarn.sh script to start yarn cluster,but the cluster only start up with 
 one resourcemanager  on the node which start-yarn.sh be executed. I think 
 yarn should sense how many RMs been configured at the beginning, and start 
 them all in start-yarn.sh script. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2683) registry config options: document and move to core-default

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316260#comment-14316260
 ] 

Hudson commented on YARN-2683:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7070 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7070/])
YARN-2683. [YARN-913] registry config options: document and move to 
core-default. (stevel) (stevel: rev c3da2db48fd18c41096fe5d6d4650978fb31ae24)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/index.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/using-the-yarn-service-registry.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-security.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.7.0

 Attachments: HADOOP-10530-005.patch, YARN-2683-001.patch, 
 YARN-2683-002.patch, YARN-2683-003.patch, YARN-2683-006.patch

   Original Estimate: 1h
  Time Spent: 1h
  Remaining Estimate: 0.5h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA

2015-02-11 Thread Neill Lima (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316294#comment-14316294
 ] 

Neill Lima commented on YARN-3152:
--

[~vinodkv]] -- It fails in both RMs indeed. What was 'unexpected' it didn't 
fail in single RM because of the missing exclude file. 

Is the need of the exclude file so relevant to the RMs but not so much for the 
NNs? Because the behavior (NNs vs RMs) is very different. I lean towards the 
NNs behavior. 

 Missing hadoop exclude file fails RMs in HA
 ---

 Key: YARN-3152
 URL: https://issues.apache.org/jira/browse/YARN-3152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: Debian 7
Reporter: Neill Lima
Assignee: Naganarasimha G R

 NI have two NNs in HA, they do not fail when the exclude file is not present 
 (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
 HA. I didn't create the exclude file at this point as well. I applied the HA 
 RM settings properly and when I started both RMs I started getting this 
 exception:
 2015-02-06 12:25:25,326 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
 OPERATION=transitionToActiveTARGET=RMHAProtocolService  
 RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
 PERMISSIONS=All users are allowed
 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
   ... 4 more
 Caused by: org.apache.hadoop.ha.ServiceFailedException: 
 java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
 or directory)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
   ... 5 more
 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
 Trying to re-establish ZK session
 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
 0x44af32566180094 closed
 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
 client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
 sessionTimeout=1 
 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
 using SASL (unknown error)
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to x.x.x.x/x.x.x.x:2181, initiating session
 The issue is descriptive enough to resolve the problem - and it has been 
 fixed by creating the exclude file. 
 I just think as of a improvement: 
 - Should RMs ignore the missing file as the NNs did?
 - Should single RM fail even when the file is not present?
 Just suggesting this improvement to keep the behavior consistent when working 
 with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2616) Add CLI client to the registry to list, view and manipulate entries

2015-02-11 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2616:
-
Summary: Add CLI client to the registry to list, view and manipulate 
entries  (was: Add CLI client to the registry to list/view entries)

 Add CLI client to the registry to list, view and manipulate entries
 ---

 Key: YARN-2616
 URL: https://issues.apache.org/jira/browse/YARN-2616
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Akshay Radia
 Attachments: YARN-2616-003.patch, YARN-2616-008.patch, 
 YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, 
 yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch


 registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316331#comment-14316331
 ] 

Hudson commented on YARN-2809:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/])
YARN-2809. Implement workaround for linux kernel panic when removing cgroup. 
Contributed by Nathan Roberts (jlowe: rev 
3f5431a22fcef7e3eb9aceeefe324e5b7ac84049)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 Implement workaround for linux kernel panic when removing cgroup
 

 Key: YARN-2809
 URL: https://issues.apache.org/jira/browse/YARN-2809
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment:  RHEL 6.4
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Fix For: 2.7.0

 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch


 Some older versions of linux have a bug that can cause a kernel panic when 
 the LCE attempts to remove a cgroup. It is a race condition so it's a bit 
 rare but on a few thousand node cluster it can result in a couple of panics 
 per day.
 This is the commit that likely (haven't verified) fixes the problem in linux: 
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
 Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316323#comment-14316323
 ] 

Hudson commented on YARN-2246:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/])
YARN-2246. Made the proxy tracking URL always be http(s)://proxy 
addr:port/proxy/appId to avoid duplicate sections. Contributed by Devaraj K. 
(zjshen: rev d5855c0e46404cfc1b5a63e59015e68ba668f0ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.7.0

 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
 YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316324#comment-14316324
 ] 

Hudson commented on YARN-3160:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/])
YARN-3160. Fix non-atomic operation on nodeUpdateQueue in RMNodeImpl. 
(Contributed by Chengbing Liu) (junping_du: rev 
c541a374d88ffed6ee71b0e5d556939ccd2c5159)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


 Non-atomic operation on nodeUpdateQueue in RMNodeImpl
 -

 Key: YARN-3160
 URL: https://issues.apache.org/jira/browse/YARN-3160
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.7.0

 Attachments: YARN-3160.2.patch, YARN-3160.patch


 {code:title=RMNodeImpl.java|borderStyle=solid}
 while(nodeUpdateQueue.peek() != null){
   latestContainerInfoList.add(nodeUpdateQueue.poll());
 }
 {code}
 The above code brings potential risk of adding null value to 
 {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a 
 wait-free algorithm, we can directly poll the queue, before checking whether 
 the value is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316328#comment-14316328
 ] 

Hudson commented on YARN-3090:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/])
YARN-3090. DeletionService can silently ignore deletion task failures. 
Contributed by Varun Saxena (jlowe: rev 
4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* hadoop-yarn-project/CHANGES.txt


 DeletionService can silently ignore deletion task failures
 --

 Key: YARN-3090
 URL: https://issues.apache.org/jira/browse/YARN-3090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, 
 YARN-3090.003.patch, YARN-3090.04.patch


 If a non-I/O exception occurs while the DeletionService is executing a 
 deletion task then it will be silently ignored.  The exception bubbles up to 
 the thread workers of the ScheduledThreadPoolExecutor which simply attaches 
 the throwable to the Future that was returned when the task was scheduled.  
 However the thread pool is used as a fire-and-forget pool, so nothing ever 
 looks at the Future and therefore the exception is never logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui

2015-02-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316961#comment-14316961
 ] 

Jian He commented on YARN-3171:
---

sorry, YARN-2163 is committed already, this is probably a different problem,  
forget what I said.

 Sort by application id doesn't work in ATS web ui
 -

 Key: YARN-3171
 URL: https://issues.apache.org/jira/browse/YARN-3171
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: ats_webui.png


 The order doesn't change when I click the column header



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA

2015-02-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316887#comment-14316887
 ] 

Naganarasimha G R commented on YARN-3152:
-

[~vinodkv], [~xgong], [~neillfontes]  [~rohithsharma]
 From the discussions till now what i could conclude is :  As per the 
design if the required files are not there we need to fail fast, i.e. in case 
of Non HA cluster we should throw exception  RM should fail to start . And in 
case of HA, transition to active should fail and none of the services should be 
active on failure. And as part of this jira we need to achieve this. Please 
inform if this approach is fine or needs more discussion on this.

[~neillfontes],
Hope you got to test with the steps which i mentioned in my earlier 
[comment|https://issues.apache.org/jira/browse/YARN-3152?focusedCommentId=14313875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14313875].
 Seems like you were able to see the same behavior what i mentioned in step 2 
but wanted to know more about step3 where in i see the actives services getting 
oscillated between the 2 RM servers. Is it the same behavior as i mentioned or 
i am missing something.  

 Missing hadoop exclude file fails RMs in HA
 ---

 Key: YARN-3152
 URL: https://issues.apache.org/jira/browse/YARN-3152
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: Debian 7
Reporter: Neill Lima
Assignee: Naganarasimha G R

 NI have two NNs in HA, they do not fail when the exclude file is not present 
 (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
 HA. I didn't create the exclude file at this point as well. I applied the HA 
 RM settings properly and when I started both RMs I started getting this 
 exception:
 2015-02-06 12:25:25,326 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
 OPERATION=transitionToActiveTARGET=RMHAProtocolService  
 RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
 PERMISSIONS=All users are allowed
 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
   ... 4 more
 Caused by: org.apache.hadoop.ha.ServiceFailedException: 
 java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
 or directory)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
   ... 5 more
 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
 Trying to re-establish ZK session
 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
 0x44af32566180094 closed
 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
 client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
 sessionTimeout=1 
 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
 using SASL (unknown error)
 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to x.x.x.x/x.x.x.x:2181, initiating session
 The issue is descriptive enough to resolve the problem - and it has been 
 fixed by creating the exclude file. 
 I just think as of a improvement: 
 - Should RMs ignore the missing file as the NNs did?
 - Should single RM fail even when the file is not present?
 Just suggesting this improvement to keep the behavior consistent when working 
 with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui

2015-02-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316960#comment-14316960
 ] 

Jian He commented on YARN-3171:
---

[~Naganarasimha], thanks for working on this ! YARN-2163 is probably related to 
this too.

 Sort by application id doesn't work in ATS web ui
 -

 Key: YARN-3171
 URL: https://issues.apache.org/jira/browse/YARN-3171
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: ats_webui.png


 The order doesn't change when I click the column header



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2015-02-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316980#comment-14316980
 ] 

Jason Lowe commented on YARN-914:
-

Thanks for updating the doc, Junping.  Additional comments:

Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS?

The design says when a node starts decommissioning we will remove its resources 
from the cluster, but that's not really the case, correct?  We should remove 
its available (not total) resources from the cluster then continue to remove 
available resources as containers complete on that node.  Failing to do so will 
result in weird metrics like more resources running on the cluster than the 
cluster says it has, etc.

Are we only going to support graceful decommission via updates to the 
include/exclude files and refresh?  Not needed for the initial cut, but 
thinking of a couple of use-cases and curious what others thought:
* Would be convenient to have an rmadmin command that does this in one step, 
especially for a single-node.  Arguably if we are persisting cluster nodes in 
the state store we can migrate the list there, and the include/exclude list 
simply become convenient ways to batch-update the cluster state.
* Will NMs be able to request a graceful decommission via their health check 
script?  There have been some cases in the past where it would have been nice 
for the NM to request a ramp-down on containers but not instantly kill all of 
them with an UNHEALTHY report.

As for the UI changes, initial thought is that decommissioning nodes should 
still show up in the active nodes list since they are still running containers. 
 A separate decommissioning tab to filter for those nodes would be nice, 
although I suppose users can also just use the jquery table to sort/search for 
nodes in that state from the active nodes list if it's too crowded to add yet 
another node state tab (or maybe get rid of some effectively dead tabs like the 
reboot state tab).

For the NM restart open question, this should no longer an issue now that the 
NM is unaware of graceful decommission  All the RM needs to do is ensure that a 
node that is rejoining the cluster when the RM thought it was already part of 
it retains its previous running/decommissioning state.  That way if an NM is 
decommissioning before the restart it will continue to decommission after it 
restarts.

For the AM dealing with being notified of decommissioning, again I think this 
should just be treated like a strict preemption for the short term.  IMHO all 
the AM needs to know is that the RM is planning on taking away those 
containers, and what the AM should do about it is similar whether the reason 
for removal is preemption or decommissioning.

Back to the long running services delaying decommissioning concern, does YARN 
even know the difference between a long-running container and a normal 
container?  If it doesn't, how is it supposed to know a container is not going 
to complete anytime soon?  Even a normal container could run for many hours.  
It seems to me the first thing we would need before worrying about this 
scenario is the ability for YARN to know/predict the expected runtime of 
containers.

There's still an open question about tracking the timeout RM side instead of NM 
side.  Sounds like the NM side is not going to be pursued at this point, and 
we're going with no built-in timeout support in YARN for the short-term.

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du
 Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
 Gracefully Decommission of NodeManager (v2).pdf


 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3167) implement the core functionality of the base aggregator service

2015-02-11 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317032#comment-14317032
 ] 

Vrushali C commented on YARN-3167:
--


Related: Attached on YARN-3031 a sequence diagram that reflects the 
interactions for writing between the AM, the base aggregator service, the 
timeline service writer api and backend store.

https://issues.apache.org/jira/secure/attachment/12698191/Sequence_diagram_write_interaction.png


 implement the core functionality of the base aggregator service
 ---

 Key: YARN-3167
 URL: https://issues.apache.org/jira/browse/YARN-3167
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 The basic skeleton of the timeline aggregator has been set up by YARN-3030. 
 We need to implement the core functionality of the base aggregator service. 
 The key things include
 - handling the requests from clients (sync or async)
 - buffering data
 - handling the aggregation logic
 - invoking the storage API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3178) Clean up stack trace on the client when RM fails over

2015-02-11 Thread Arpit Gupta (JIRA)
Arpit Gupta created YARN-3178:
-

 Summary: Clean up stack trace on the client when RM fails over
 Key: YARN-3178
 URL: https://issues.apache.org/jira/browse/YARN-3178
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Arpit Gupta
Priority: Minor


When the client fails over it spits out a stack trace. It will be good to clean 
that up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) create backing storage write interface for ATS writers

2015-02-11 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: Sequence_diagram_write_interaction.png

Attaching a sequence diagram that reflects the interactions for writing between 
the AM, the base aggregator service, the timeline service writer api and 
backend store.

 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.png, 
 YARN-3031.01.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) create backing storage write interface for ATS writers

2015-02-11 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: YARN-3031.01.patch

 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: YARN-3031.01.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2015-02-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316981#comment-14316981
 ] 

Bartosz Ługowski commented on YARN-1621:


Thanks for comments [~vinodkv] and [~Naganarasimha].

[~vinodkv], I can't rename it to list-containers, because options parser 
disallows - char.
[~Naganarasimha], I think it is good idea to move it to yarn container -list, 
so we can easily add additional filters only in one place.


 Add CLI to list rows of task attempt ID, container ID, host of container, 
 state of container
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Assignee: Bartosz Ługowski
 Fix For: 2.7.0

 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch


 As more applications are moved to YARN, we need generic CLI to list rows of 
 task attempt ID, container ID, host of container, state of container. Today 
 if YARN application running in a container does hang, there is no way to find 
 out more info because a user does not know where each attempt is running in.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers.
  
 {code:title=proposed yarn cli}
 $ yarn application -list-containers -applicationId appId [-containerState 
 state of container]
 where containerState is optional filter to list container in given state only.
 container state can be running/succeeded/killed/failed/all.
 A user can specify more than one container state at once e.g. KILLED,FAILED.
 task attempt ID container ID host of container state of container 
 {code}
 CLI should work with running application/completed application. If a 
 container runs many task attempts, all attempts should be shown. That will 
 likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3178) Clean up stack trace on the client when RM fails over

2015-02-11 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317034#comment-14317034
 ] 

Arpit Gupta commented on YARN-3178:
---

Here is an example stack trace

{code}
14/02/25 17:40:23 WARN retry.RetryInvocationHandler: Exception while invoking 
class 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport.
 Not retrying because the invoked method is not idempotent, and unable to 
determine whether it was invoked
java.io.IOException: Failed on local exception: java.io.EOFException; Host 
Details : local host is: host/ip; destination host is: host:8032; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy11.getApplicationReport(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:142)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:268)
at 
org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:294)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:152)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:319)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:531)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344)
at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:407)
at 
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:855)
at 
org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1018)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:135)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1050)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:945)
14/02/25 17:40:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
14/02/25 17:40:23 INFO mapreduce.Job:  map 14% reduce 0%
14/02/25 17:40:24 WARN retry.RetryInvocationHandler: Exception while invoking 
class 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport.
 Not retrying because the invoked method is not idempotent, and unable to 
determine whether it was invoked
java.io.IOException: Failed on local exception: java.io.EOFException; Host 
Details : local host is: host/ip; destination host is: host:8032; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at 

[jira] [Updated] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-02-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2764:
-
Labels: counters  (was: )

 counters.LimitExceededException shouldn't abort AsyncDispatcher
 ---

 Key: YARN-2764
 URL: https://issues.apache.org/jira/browse/YARN-2764
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Ted Yu
  Labels: counters

 I saw the following in container log:
 {code}
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attemptattempt_1414221548789_0023_r_03_0
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING
 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
 the event EventType: JOB_COMMIT
 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
   at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
 {code}
 Counter limit was exceeded when JobFinishedEvent was created.
 Better handling of LimitExceededException should be provided so that 
 AsyncDispatcher can continue functioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) create backing storage write interface for ATS writers

2015-02-11 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3031:
-
Attachment: (was: YARN-3031.01.patch)

 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C

 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3178) Clean up stack trace on the client when RM fails over

2015-02-11 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3178:
--

Assignee: Varun Saxena

 Clean up stack trace on the client when RM fails over
 -

 Key: YARN-3178
 URL: https://issues.apache.org/jira/browse/YARN-3178
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Arpit Gupta
Assignee: Varun Saxena
Priority: Minor

 When the client fails over it spits out a stack trace. It will be good to 
 clean that up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3036) [Storage implementation] Create standalone HBase backing storage implementation for ATS writes

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3036:
--
Summary: [Storage implementation] Create standalone HBase backing storage 
implementation for ATS writes  (was: create standalone HBase backing storage 
implementation for ATS writes)

 [Storage implementation] Create standalone HBase backing storage 
 implementation for ATS writes
 --

 Key: YARN-3036
 URL: https://issues.apache.org/jira/browse/YARN-3036
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen

 Per design in YARN-2928, create a (default) standalone HBase backing storage 
 implementation for ATS writes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3040:
--
Summary: [Data Model] Implement client-side API for handling flows  (was: 
[Data ] Implement client-side API for handling flows)

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3042) [Data Model] Create ATS metrics API

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3042:
--
Summary: [Data Model] Create ATS metrics API  (was: create ATS metrics API)

 [Data Model] Create ATS metrics API
 ---

 Key: YARN-3042
 URL: https://issues.apache.org/jira/browse/YARN-3042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Siddharth Wagle

 Per design in YARN-2928, create the ATS metrics API and integrate it into the 
 entities.
 The concept may be based on the existing hadoop metrics, but we want to make 
 sure we have something that would satisfy all ATS use cases.
 It also needs to capture whether a metric should be aggregated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3047:
--
Summary: [Data Serving] Set up ATS reader with basic request serving 
structure and lifecycle  (was: set up ATS reader with basic request serving 
structure and lifecycle)

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3048) [Data Serving] Handle how to set up and start/stop ATS reader instances

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3048:
--
Summary: [Data Serving] Handle how to set up and start/stop ATS reader 
instances  (was: handle how to set up and start/stop ATS reader instances)

 [Data Serving] Handle how to set up and start/stop ATS reader instances
 ---

 Key: YARN-3048
 URL: https://issues.apache.org/jira/browse/YARN-3048
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, come up with a way to set up and start/stop ATS 
 reader instances.
 This should allow setting up multiple instances and managing user traffic to 
 those instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3051:
--
Summary: [Storage abstraction] Create backing storage read interface for 
ATS readers  (was: create backing storage read interface for ATS readers)

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-02-11 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317180#comment-14317180
 ] 

Robert Kanter commented on YARN-3181:
-

LGTM +1 pending Jenkins.  
Cleaning out findbugs are always a good thing

 FairScheduler: Fix up outdated findbugs issues
 --

 Key: YARN-3181
 URL: https://issues.apache.org/jira/browse/YARN-3181
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-3181-1.patch


 In FairScheduler, we have excluded some findbugs-reported errors. Some of 
 them aren't applicable anymore, and there are a few that can be easily fixed 
 without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3050) [Data Serving] Implement new flow-based ATS queries in the new ATS design

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3050:
--
Summary: [Data Serving] Implement new flow-based ATS queries in the new ATS 
design  (was: implement new flow-based ATS queries in the new ATS design)

 [Data Serving] Implement new flow-based ATS queries in the new ATS design
 -

 Key: YARN-3050
 URL: https://issues.apache.org/jira/browse/YARN-3050
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Flow based queries.docx


 Implement new flow-based ATS queries in the new ATS design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3052) [Data Serving] Provide a very simple POC html ATS UI

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3052:
--
Summary: [Data Serving] Provide a very simple POC html ATS UI  (was: 
provide a very simple POC html ATS UI)

 [Data Serving] Provide a very simple POC html ATS UI
 

 Key: YARN-3052
 URL: https://issues.apache.org/jira/browse/YARN-3052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 As part of accomplishing a minimum viable product, we want to be able to show 
 some UI in html (however crude it is). This subtask calls for creating a 
 barebones UI to do that.
 This should be replaced later with a better-designed and implemented proper 
 UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3049) [Compatiblity] Implement existing ATS queries in the new ATS design

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3049:
--
Summary: [Compatiblity] Implement existing ATS queries in the new ATS 
design  (was: implement existing ATS queries in the new ATS design)

 [Compatiblity] Implement existing ATS queries in the new ATS design
 ---

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen

 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3150:
--
Summary: [Documentation] Documenting the timeline service v2  (was: 
Documenting the timeline service v2)

 [Documentation] Documenting the timeline service v2
 ---

 Key: YARN-3150
 URL: https://issues.apache.org/jira/browse/YARN-3150
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Let's make sure we will have a document to describe what's new in TS v2, the 
 APIs, the client libs and so on. We should do better around documentation in 
 v2 than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3166:
--
Summary: [Source organization] Decide detailed package structures for 
timeline service v2 components  (was: Decide detailed package structures for 
timeline service v2 components)

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui

2015-02-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317252#comment-14317252
 ] 

Zhijie Shen commented on YARN-3171:
---

Does YARN-2766 not fix the problem?

 Sort by application id doesn't work in ATS web ui
 -

 Key: YARN-3171
 URL: https://issues.apache.org/jira/browse/YARN-3171
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: ats_webui.png


 The order doesn't change when I click the column header



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers

2015-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317122#comment-14317122
 ] 

Hadoop QA commented on YARN-3031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698194/YARN-3031.01.patch
  against trunk revision 50625e6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6598//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6598//console

This message is automatically generated.

 create backing storage write interface for ATS writers
 --

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.png, 
 YARN-3031.01.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3180) container-executor gets SEGV for default banned user

2015-02-11 Thread Olaf Flebbe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317142#comment-14317142
 ] 

Olaf Flebbe commented on YARN-3180:
---

Seems like it. The fix to the logic is the same. The test is different. I do 
test the exact environment, the other test does test much more than simply the 
check_user() call.

The patch does not apply on git trunk.

please apply either .

 container-executor gets SEGV for default banned user
 

 Key: YARN-3180
 URL: https://issues.apache.org/jira/browse/YARN-3180
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1, 2.6.1
Reporter: Olaf Flebbe
 Attachments: 
 0001-YARN-3180-container-executor-gets-SEGV-for-default-b.patch


 container-executor dumps core if container-executor.cfg 
 * Does not contain a banned.users statement, getting the default in effect
 * The banned user id is above min.user.id
 * The user is contained in the default banned.user
 and yes this did happened to me.
 Patch and test appended (relativ to git trunk)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label

2015-02-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3124:
-
Attachment: YARN-3124.5.patch

Thanks comments [~jianhe]. Attached new patch (ver.5)

 Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track 
 capacities-by-label
 

 Key: YARN-3124
 URL: https://issues.apache.org/jira/browse/YARN-3124
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, 
 YARN-3124.4.patch, YARN-3124.5.patch


 After YARN-3098, capacities-by-label (include 
 used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be 
 tracked in QueueCapacities.
 This patch is targeting to make capacities-by-label in CS Queues are all 
 tracked by QueueCapacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3031:
--
Summary: [Storage abstraction] Create backing storage write interface for 
ATS writers  (was: create backing storage write interface for ATS writers)

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.png, 
 YARN-3031.01.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3033:
--
Summary: [Aggregator wireup] Implement NM starting the ATS writer companion 
 (was: implement NM starting the ATS writer companion)

 [Aggregator wireup] Implement NM starting the ATS writer companion
 --

 Key: YARN-3033
 URL: https://issues.apache.org/jira/browse/YARN-3033
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, implement node managers starting the ATS writer 
 companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3035) [Storage implementation] Create a test-only backing storage implementation for ATS writes

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3035:
--
Summary: [Storage implementation] Create a test-only backing storage 
implementation for ATS writes  (was: create a test-only backing storage 
implementation for ATS writes)

 [Storage implementation] Create a test-only backing storage implementation 
 for ATS writes
 -

 Key: YARN-3035
 URL: https://issues.apache.org/jira/browse/YARN-3035
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Per design in YARN-2928, create a test-only bare bone backing storage 
 implementation for ATS writes.
 We could consider something like a no-op or in-memory storage strictly for 
 development and testing purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3043) [Data Model] Create ATS configuration, metadata, etc. as part of entities

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3043:
--
Summary: [Data Model] Create ATS configuration, metadata, etc. as part of 
entities  (was: create ATS configuration, metadata, etc. as part of entities)

 [Data Model] Create ATS configuration, metadata, etc. as part of entities
 -

 Key: YARN-3043
 URL: https://issues.apache.org/jira/browse/YARN-3043
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, create APIs for configuration, metadata, etc. and 
 integrate them into entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3087:
--
Summary: [Aggregator implementation] the REST server (web server) for 
per-node aggregator does not work if it runs inside node manager  (was: the 
REST server (web server) for per-node aggregator does not work if it runs 
inside node manager)

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Devaraj K

 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3115) [Aggregator wireup] Work-preserving restarting of per-node aggregator

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3115:
--
Summary: [Aggregator wireup] Work-preserving restarting of per-node 
aggregator  (was: Work-preserving restarting of per-node aggregator)

 [Aggregator wireup] Work-preserving restarting of per-node aggregator
 -

 Key: YARN-3115
 URL: https://issues.apache.org/jira/browse/YARN-3115
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-3030 makes the per-node aggregator work as the aux service of a NM. It 
 contains the states of the per-app aggregators corresponding to the running 
 AM containers on this NM. While NM is restarted in work-preserving mode, this 
 information of per-node aggregator needs to be carried on over restarting too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3053) [Security] Review and implement for property security in ATS v.2

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3053:
--
Summary: [Security] Review and implement for property security in ATS v.2  
(was: review and implement for property security in ATS v.2)

 [Security] Review and implement for property security in ATS v.2
 

 Key: YARN-3053
 URL: https://issues.apache.org/jira/browse/YARN-3053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen

 Per design in YARN-2928, we want to evaluate and review the system for 
 security, and ensure proper security in the system.
 This includes proper authentication, token management, access control, and 
 any other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio

2015-02-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317255#comment-14317255
 ] 

Wangda Tan commented on YARN-3153:
--

Since most of the capacity-related config are ranged \[0, 100\], 
maximum-am-resource-percent should be a part of capacity settings like queue 
capacity, queue maximum-capacity.

So I propose to make config to be
Global configuration:
{{yarn.scheduler.capacity.maximum-am-capacity-per-queue}}, default is 10 (10%)

Queue configuration:
{{yarn.scheduler.capacity.queue-path.maximum-am-capacity}}

And to avoid confusion, we should deprecate:
{{yarn.scheduler.capacity.maximum-am-resource-percent}}
{{yarn.scheduler.capacity.queue-path.maximum-am-resource-percent}}

In addition, maximum-am-capacity for queue is inheritable, when admin set a 
value for max-am in parent queue, leaf queue will inherit max-am if itself 
doesn't set.

Sounds like a plan?

 Capacity Scheduler max AM resource limit for queues is defined as percentage 
 but used as ratio
 --

 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical

 In existing Capacity Scheduler, it can limit max applications running within 
 a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
 but actually, it is used as ratio, in implementation, it assumes input will 
 be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
 of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3183) Some classes define hashcode() but not equals()

2015-02-11 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3183:

Attachment: YARN-3183.patch

The patch adds {{equals}} methods that use the same variable that was used in 
{{hashCode}}.  It also removes the unnecessary {{equals}} method.

 Some classes define hashcode() but not equals()
 ---

 Key: YARN-3183
 URL: https://issues.apache.org/jira/browse/YARN-3183
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Minor
 Attachments: YARN-3183.patch


 These files all define {{hashCode}}, but don't define {{equals}}:
 {noformat}
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationAttemptFinishEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationAttemptStartEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationFinishEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationStartEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingContainerFinishEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingContainerStartEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptFinishedEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptRegisteredEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationFinishedEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerFinishedEvent.java
 {noformat}
 This one unnecessarily defines {{equals}}:
 {noformat}
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceRetentionSet.java
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317048#comment-14317048
 ] 

Wangda Tan commented on YARN-2495:
--

Hi [~cwelch],
Thanks for jumping in and providing your thoughts, and really sorry for the 
late response.
I think biggest concern of you is about DECENTRALIZED_CONFIGURATION_ENABLED, 
let me talk about my thinkings :)

IMHO, mixing decentralized/centralized is dangerous and will cause 
non-determinated result. You may think about merging them together, such as 
some labels set by admin using RMAdminCLI, and some others are set by NM. But I 
can give you an example shows it is still non-determinated even if we have +/- 
for ResourceTracker protocol:
- Assume a node has label x,y (reported +x,+y)
- RMAdmin remove y from the node (-y)
- NM failure then restart, and report it has x,y (+x, +y). What should labels 
on the node be?

I also don't like adding too much switches in configuration, but it seems a 
good way that we can support both with determinated behavior.

For your other suggestions,
- Name changes is-are,
- Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest
I all agree with

One more suggestion (as per suggested by [~vinodkv]), when there's anything 
wrong with node label reported from NM, we should fail NM (ask it to shutdown 
and give it proper diagnostic message). This is because if NM report a label 
but rejected, even if RM tell NM this, NM cannot handle it properly except 
print some error messages (we don't have smart logic now). Which will lead to 
problems in debugging (A NM reported some label to RM but scheduler failed 
allocating containers on the NM). To avoid it, a simple way is to shutdown the 
NM and admin can take a look at what happened.

Thoughts?
Wangda

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-02-11 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-3181:
--

 Summary: FairScheduler: Fix up outdated findbugs issues
 Key: YARN-3181
 URL: https://issues.apache.org/jira/browse/YARN-3181
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


In FairScheduler, we have excluded some findbugs-reported errors. Some of them 
aren't applicable anymore, and there are a few that can be easily fixed without 
needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2015-02-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2079:
-
Attachment: YARN-2079.002.patch

Rebased patch for trunk.  [~djp] could you take a look?  It would be nice to 
get this into 2.7.

 Recover NonAggregatingLogHandler state upon nodemanager restart
 ---

 Key: YARN-2079
 URL: https://issues.apache.org/jira/browse/YARN-2079
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2079.002.patch, YARN-2079.patch


 The state of NonAggregatingLogHandler needs to be persisted so logs are 
 properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2015-02-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2079:
-
Target Version/s: 2.7.0  (was: 2.6.0)

 Recover NonAggregatingLogHandler state upon nodemanager restart
 ---

 Key: YARN-2079
 URL: https://issues.apache.org/jira/browse/YARN-2079
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2079.002.patch, YARN-2079.patch


 The state of NonAggregatingLogHandler needs to be persisted so logs are 
 properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-02-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3181:
---
Attachment: yarn-3181-1.patch

 FairScheduler: Fix up outdated findbugs issues
 --

 Key: YARN-3181
 URL: https://issues.apache.org/jira/browse/YARN-3181
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-3181-1.patch


 In FairScheduler, we have excluded some findbugs-reported errors. Some of 
 them aren't applicable anymore, and there are a few that can be easily fixed 
 without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3038) [Aggregator wireup] Handle ATS writer failure scenarios

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3038:
--
Summary: [Aggregator wireup] Handle ATS writer failure scenarios  (was: 
handle ATS writer failure scenarios)

 [Aggregator wireup] Handle ATS writer failure scenarios
 ---

 Key: YARN-3038
 URL: https://issues.apache.org/jira/browse/YARN-3038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, consider various ATS writer failure scenarios, and 
 implement proper handling.
 For example, ATS writers may fail and exit due to OOM. It should be retried a 
 certain number of times in that case. We also need to tie fatal ATS writer 
 failures (after exhausting all retries) to the application failure, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3125:
--
Summary: [Event producers] Change distributed shell to use new timeline 
service  (was: Change distributed shell to use new timeline service)

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3134:
--
Summary: [Storage implementation] Exploiting the option of using Phoenix to 
access HBase backend  (was: Exploiting the option of using Phoenix to access 
HBase backend)

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown

2015-02-11 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3168:
---
Assignee: Gururaj Shetty

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3179) Update use of Iterator to Iterable

2015-02-11 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-3179:


 Summary: Update use of Iterator to Iterable
 Key: YARN-3179
 URL: https://issues.apache.org/jira/browse/YARN-3179
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor


Found these using the IntelliJ Findbugs-IDEA plugin, which uses findbugs3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label

2015-02-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317126#comment-14317126
 ] 

Jian He commented on YARN-3124:
---

remove unused QueueCapacities#reinitialize
AbstractCSQueue#setupCapacities - setupConfiguredCapacities
{{so we shouldn't do this for reservation queue }}, mind clarifying more?
minor format
{code}
super.setupQueueConfigs(clusterResource);
   StringBuilder aclsString = new StringBuilder();
  public synchronized void reinitialize(
  CSQueue newlyParsedQueue, Resource clusterResource)
  throws IOException {
{code}

 Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track 
 capacities-by-label
 

 Key: YARN-3124
 URL: https://issues.apache.org/jira/browse/YARN-3124
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, 
 YARN-3124.4.patch


 After YARN-3098, capacities-by-label (include 
 used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be 
 tracked in QueueCapacities.
 This patch is targeting to make capacities-by-label in CS Queues are all 
 tracked by QueueCapacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3034:
--
Summary: [Aggregator wireup] Implement RM starting its ATS writer  (was: 
implement RM starting its ATS writer)

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name

2015-02-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317162#comment-14317162
 ] 

Wangda Tan commented on YARN-3164:
--

In addition, [~bibinchundatt], could you replace tabs in your patch to spaces?

 rmadmin command usage prints incorrect command name
 ---

 Key: YARN-3164
 URL: https://issues.apache.org/jira/browse/YARN-3164
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: YARN-3164.1.patch


 /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color}
 transitionToActive: incorrect number of arguments
 Usage:{color:red}  HAAdmin  {color} [-transitionToActive serviceId 
 [--forceactive]]
 {color:red} ./yarn HAAdmin {color} 
 Error: Could not find or load main class HAAdmin
 Expected it should be rmadmin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3041) [Data Model] create the ATS entity/event API

2015-02-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3041:
--
Summary: [Data Model] create the ATS entity/event API  (was: [API] create 
the ATS entity/event API)

 [Data Model] create the ATS entity/event API
 

 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: YARN-3041.preliminary.001.patch


 Per design in YARN-2928, create the ATS entity and events API.
 Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
 flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >