date:20150727


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642625#comment-14642625
 ] 

Hudson commented on YARN-3958:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #999 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/999/])
YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api 
module. Contributed by Varun Saxena. (aajisaka: rev 
42d4e0ae99d162fde52902cb86e29f2c82a084c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642615#comment-14642615
 ] 

Hudson commented on YARN-3958:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #269 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/269/])
YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api 
module. Contributed by Varun Saxena. (aajisaka: rev 
42d4e0ae99d162fde52902cb86e29f2c82a084c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED


[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642803#comment-14642803
 ] 

Sangjin Lee commented on YARN-2856:
---

The patch applies to 2.6.0 cleanly.

 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED


 [ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2856:
--
Labels: 2.6.1-candidate  (was: )

 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues


[ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642816#comment-14642816
 ] 

Sangjin Lee commented on YARN-3850:
---

The merge to 2.6.0 is straightforward.

 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become full(~90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.
 In addition to this, there are 2 more issues :
 # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so 
 NM will fail to serve up logs from full disks from its web interfaces.
 # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
 disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes


[ 
https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642839#comment-14642839
 ] 

Sangjin Lee commented on YARN-3949:
---

Thanks folks for reviewing and committing the patch!

 ensure timely flush of timeline writes
 --

 Key: YARN-3949
 URL: https://issues.apache.org/jira/browse/YARN-3949
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Fix For: YARN-2928

 Attachments: YARN-3949-YARN-2928.001.patch, 
 YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch, 
 YARN-3949-YARN-2928.003.patch, YARN-3949-YARN-2928.004.patch, 
 YARN-3949-YARN-2928.004.patch


 Currently flushing of timeline writes is not really handled. For example, 
 {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch 
 and write puts asynchronously. However, {{BufferedMutator}} may not flush 
 them to HBase unless the internal buffer fills up.
 We do need a flush functionality first to ensure that data are written in a 
 reasonably timely manner, and to be able to ensure some critical writes are 
 done synchronously (e.g. key lifecycle events).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-27 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642838#comment-14642838
 ] 

Rohith Sharma K S commented on YARN-3543:
-

[~xgong] would you have look at updated patch please?

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2905) AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted


[ 
https://issues.apache.org/jira/browse/YARN-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642804#comment-14642804
 ] 

Sangjin Lee commented on YARN-2905:
---

The patch applies to 2.6.0 cleanly.

 AggregatedLogsBlock page can infinitely loop if the aggregated log file is 
 corrupted
 

 Key: YARN-2905
 URL: https://issues.apache.org/jira/browse/YARN-2905
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-2905.patch


 If the AggregatedLogsBlock page tries to serve up a portion of a log file 
 that has been corrupted (e.g.: like the case that was fixed by YARN-2724) 
 then it can spin forever trying to seek to the targeted log segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642855#comment-14642855
 ] 

Hudson commented on YARN-3958:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #266 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/266/])
YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api 
module. Contributed by Varun Saxena. (aajisaka: rev 
42d4e0ae99d162fde52902cb86e29f2c82a084c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml


 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI

2015-07-27 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3948:
--
Attachment: 0003-YARN-3948.patch

Uploading a patch after fixing test failures.

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 
 0003-YARN-3948.patch, ApplicationPage.png, ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels


 [ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3238:
--
Labels: 2.6.1-candidate  (was: )

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642811#comment-14642811
 ] 

Hudson commented on YARN-3958:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #258 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/258/])
YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api 
module. Contributed by Varun Saxena. (aajisaka: rev 
42d4e0ae99d162fde52902cb86e29f2c82a084c8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt


 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order


 [ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3222:
--
Labels: 2.6.1-candidate  (was: )

 RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
 order
 ---

 Key: YARN-3222
 URL: https://issues.apache.org/jira/browse/YARN-3222
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch


 When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
 scheduler in a events node_added,node_removed or node_resource_update. These 
 events should be notified in an sequential order i.e node_added event and 
 next node_resource_update events.
 But if the node is reconnected with different http port, the oder of 
 scheduler events are node_removed -- node_resource_update -- node_added 
 which causes scheduler does not find the node and throw NPE and RM exit.
 Node_Resource_update event should be always should be triggered via 
 RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die


[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642814#comment-14642814
 ] 

Sangjin Lee commented on YARN-3369:
---

The patch applies to 2.6.0 cleanly.

 Missing NullPointer check in AppSchedulingInfo causes RM to die 
 

 Key: YARN-3369
 URL: https://issues.apache.org/jira/browse/YARN-3369
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Giovanni Matteo Fumarola
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3369-003.patch, YARN-3369.2.patch, YARN-3369.patch


 In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
 consecutive lines:
 {code}
 ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
 if (request.getNumContainers()  0) {
 {code}
 the first line calls getResourceRequest and it can return null.
 {code}
 synchronized public ResourceRequest getResourceRequest(
 Priority priority, String resourceName) {
 MapString, ResourceRequest nodeRequests = requests.get(priority);
 return  (nodeRequests == null) ? {color:red} null : 
 nodeRequests.get(resourceName);
 }
 {code}
 The second line dereferences the pointer directly without a check.
 If the pointer is null, the RM dies. 
 {quote}2015-03-17 14:14:04,757 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
 at java.lang.Thread.run(Thread.java:722)
 {color:red} *2015-03-17 14:14:04,758 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
 bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels


[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642805#comment-14642805
 ] 

Sangjin Lee commented on YARN-3238:
---

The patch applies to 2.6.0 cleanly.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-07-27 Thread Karthik Kambatla (JIRA)

Karthik Kambatla created YARN-3980:
--

 Summary: Plumb resource-utilization info in node heartbeat through 
to the scheduler
 Key: YARN-3980
 URL: https://issues.apache.org/jira/browse/YARN-3980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


YARN-1012 and YARN-3534 collect resource utilization information for all 
containers and the node respectively and send it to the RM on node heartbeat. 
We should plumb it through to the scheduler so the scheduler can make use of 
it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642770#comment-14642770
 ] 

Hudson commented on YARN-3958:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2196 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2196/])
YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api 
module. Contributed by Varun Saxena. (aajisaka: rev 
42d4e0ae99d162fde52902cb86e29f2c82a084c8)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642813#comment-14642813
 ] 

Sangjin Lee commented on YARN-3222:
---

The merge to 2.6.0 is straightforward.

 RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
 order
 ---

 Key: YARN-3222
 URL: https://issues.apache.org/jira/browse/YARN-3222
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch


 When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
 scheduler in a events node_added,node_removed or node_resource_update. These 
 events should be notified in an sequential order i.e node_added event and 
 next node_resource_update events.
 But if the node is reconnected with different http port, the oder of 
 scheduler events are node_removed -- node_resource_update -- node_added 
 which causes scheduler does not find the node and throw NPE and RM exit.
 Node_Resource_update event should be always should be triggered via 
 RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642860#comment-14642860
 ] 

Sangjin Lee commented on YARN-3908:
---

Thanks for the update [~vrushalic]. Are folks OK with this going in as is and 
making further changes to the event schema in a separate ticket? Let me know, 
and if everyone is fine, I'll merge this patch later today.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642886#comment-14642886
 ] 

Hudson commented on YARN-3958:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2215 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2215/])
YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api 
module. Contributed by Varun Saxena. (aajisaka: rev 
42d4e0ae99d162fde52902cb86e29f2c82a084c8)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt


 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader


[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643059#comment-14643059
 ] 

Sangjin Lee commented on YARN-3814:
---

Agreed. It would be client's responsibility to encode them correctly. Also, the 
server (e.g. jetty) may decode them properly so that the hadoop code may not be 
concerned about this.

 REST API implementation for getting raw entities in TimelineReader
 --

 Key: YARN-3814
 URL: https://issues.apache.org/jira/browse/YARN-3814
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3814-YARN-2928.01.patch, 
 YARN-3814-YARN-2928.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-27 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643085#comment-14643085
 ] 

zhihai xu commented on YARN-3965:
-

thanks for the explanation [~zhiguohong]!
bq. One option is to make nmStartupTime as a non-static filed of NMContext. But 
I doubt is it worth to make simple thing complecated. BTW, the startup 
timestampt of ResourceManager is also static.
Thanks for the information, Since time stamp of ResourceManager is also static, 
I am ok to use static. In practice, there is only one active RM and many active 
NMs. Adding nmStartupTime to NMContext is not bad, which can provide more 
information about the NM in the context. So either static or non-static is ok 
to me. Let's see what other people's opinions are.
If we use static, Can we add an API to get the time stamp for NM similar as 
ResourceManager.getClusterTimeStamp for RM?
bq. It's final so don't need warry about that
Yes, you are right. I missed final keywords

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing

2015-07-27 Thread MENG DING (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643099#comment-14643099
]

MENG DING commented on YARN-1644:
-

bq. NM re-registration can still happen between the time the increase action
is accepted, and the time it's added into increasedContainers. Even
startContainer has the same problem, newly started container may fall into this
tiny window that RM won't recover this container.
Yes, you are right that startContainer would have the same problem.
So to make it clear, RM restart/NM re-registration can happen in the following
scenarios:
* 1. Container resource increase is already completed. In this case, NM
re-registration can send the correct (increased) container size (through
containerStatus object) for RM recovery.
* 2. Container to be increased has been added into increasedContainers, but the
resource is not yet updated. In this case, NM re-registration can send the
correct container size through both containerStatus and increasedContainers
objects for RM recovery.
* 3. The increase action is accepted, but the container to be increased has not
been added into increasedContainers. In this case, the resource view between NM
and RM becomes different. The same issue applies to startContainers.

I don't have a solution for c yet, but I think the chance for scenario 3 to
happen is very small, especially with the {{blockNewContainerRequests}} and
matching RM identifier logic right now. Maybe we can log a separate JIRA for
scenario 3, and fix that for both container increase and container launch?

RM-NM protocol changes and NodeStatusUpdater implementation to support
container resizing
-

Key: YARN-1644
URL: https://issues.apache.org/jira/browse/YARN-1644
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
Attachments: YARN-1644-YARN-1197.4.patch,
YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch,
YARN-1644.3.patch, yarn-1644.1.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643048#comment-14643048
 ] 

Li Lu commented on YARN-3908:
-

Hi [~sjlee0], I'm OK with checking this in and address the event schema problem 
in a separate JIRA. 

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643155#comment-14643155
 ] 

Naganarasimha G R commented on YARN-3963:
-

Hi [~bibinchundatt],
In Distributed mode, only node to label mapping is done by individual NM's so 
there should not be any impact in distributed mode. 

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3981) support timeline clients not associated with an application

Sangjin Lee created YARN-3981:
-

 Summary: support timeline clients not associated with an 
application
 Key: YARN-3981
 URL: https://issues.apache.org/jira/browse/YARN-3981
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


In the current v.2 design, all timeline writes must belong in a 
flow/application context (cluster + user + flow + flow run + application).

But there are use cases that require writing data outside the context of an 
application. One such example is a higher level client (e.g. tez client or 
hive/oozie/cascading client) writing flow-level data that spans multiple 
applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643141#comment-14643141
 ] 

Naganarasimha G R commented on YARN-3963:
-

Hi [~leftnoteasy],
How will the user know if he is not informed that the last operation failed ? 
Any limitations if we throw exception ? I feel user need to be informed if not 
he will assume that last operation succeeded.
 


 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module

2015-07-27 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642952#comment-14642952
 ] 

Varun Saxena commented on YARN-3958:


Thanks [~ajisakaa] for the review and commit

 TestYarnConfigurationFields should be moved to hadoop-yarn-api module
 -

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.8.0

 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3981) support timeline clients not associated with an application

[
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643006#comment-14643006
]

Sangjin Lee commented on YARN-3981:
---

Some of us had an offline discussion on this. There are some major challenges
in supporting this in the v.2 design. First, obviously they may lack an
application-specific context as they can span multiple YARN apps. Second, even
if we solved the problem of the context, these clients are likely off-cluster,
and they need a way to write to the cluster. Ideas such as a separate dedicated
timeline writer just for these have been discussed, but their scalability is
problematic at best.

One idea that was suggested involves creating a specialized YARN application
that can act as a proxy for these off-cluster clients. For example, suppose you
started a tez client that can start multiple YARN apps. It can also start a
special dedicated (flow-level) timeline client. This client would launch a
special YARN app under the covers whose app master and its associated timeline
writer can serve as the proxy for timeline data the client may write. When this
special timeline client shuts down, it would tear down the associated YARN app
also.

If we go this route, we would write the YARN app itself so that the app master
listens on requests coming from the client and proxies it to the timeline
writer. We would also write the timeline client piece so that it manages the
YARN app as well as sending the write requests to the app master.

support timeline clients not associated with an application
---

Key: YARN-3981
URL: https://issues.apache.org/jira/browse/YARN-3981
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee

In the current v.2 design, all timeline writes must belong in a
flow/application context (cluster + user + flow + flow run + application).
But there are use cases that require writing data outside the context of an
application. One such example is a higher level client (e.g. tez client or
hive/oozie/cascading client) writing flow-level data that spans multiple
applications. We need to find a way to support them.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery


[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643105#comment-14643105
 ] 

Wangda Tan commented on YARN-3971:
--

[~bibinchundatt].
Thanks for working on this. I think one simpler solution for this is, we can 
leverage {{AbstractService#getServiceState}}. Doing remove label check only 
when state == STARTED.

Thoughts?

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
 0003-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI


[ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643055#comment-14643055
 ] 

Hadoop QA commented on YARN-3948:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 49s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 20s | The applied patch generated  1 
new checkstyle issues (total was 15, now 16). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 28s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  52m 28s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 112m 49s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747352/0003-YARN-3948.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2196e39 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8678/console |


This message was automatically generated.

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 
 0003-YARN-3948.patch, ApplicationPage.png, ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3981) support timeline clients not associated with an application


 [ 
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-3981:
-

Assignee: Zhijie Shen

 support timeline clients not associated with an application
 ---

 Key: YARN-3981
 URL: https://issues.apache.org/jira/browse/YARN-3981
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Zhijie Shen

 In the current v.2 design, all timeline writes must belong in a 
 flow/application context (cluster + user + flow + flow run + application).
 But there are use cases that require writing data outside the context of an 
 application. One such example is a higher level client (e.g. tez client or 
 hive/oozie/cascading client) writing flow-level data that spans multiple 
 applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3981) support timeline clients not associated with an application


[ 
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643056#comment-14643056
 ] 

Zhijie Shen commented on YARN-3981:
---

Thanks for filing the jira. I'm going to pick this up.

 support timeline clients not associated with an application
 ---

 Key: YARN-3981
 URL: https://issues.apache.org/jira/browse/YARN-3981
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee

 In the current v.2 design, all timeline writes must belong in a 
 flow/application context (cluster + user + flow + flow run + application).
 But there are use cases that require writing data outside the context of an 
 application. One such example is a higher level client (e.g. tez client or 
 hive/oozie/cascading client) writing flow-level data that spans multiple 
 applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643116#comment-14643116
 ] 

Wangda Tan commented on YARN-3963:
--

I think we shouldn't throw exception in this case. It's better to print some 
WARN message at client side when trying to add existed labels, but currently 
YARN lack of channel to put diagnostic information. I would prefer to print 
WARN message at service side, and keeps the behavior unchanged.

Thoughts? [~sunilg], [~bibinchundatt].

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

[
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643113#comment-14643113
]

Zhijie Shen commented on YARN-3908:
---

[~vrushalic], thanks for fixing the problem. W.R.T the column key, shall we use:
{code}
e!eventId?eventTimestamp?eventInfoKey : eventInfoValue
{code}

Image we have two KILL events: one on TS1 and the other on TS2. IMHO, we want
to scan through the two events' columns one-by-one instead of in a interleaved
manner. This will make reader to parse multiple events much easier and
encapsulate them one after the other. It will be more useful in the future if
we want to just retrieve part of the events of a big job (e.g. within a given
time window or the most recent events). Thoughts?

Bugs in HBaseTimelineWriterImpl
---

Key: YARN-3908
URL: https://issues.apache.org/jira/browse/YARN-3908
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
Attachments: YARN-3908-YARN-2928.001.patch,
YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch,
YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch,
YARN-3908-YARN-2928.005.patch

1. In HBaseTimelineWriterImpl, the info column family contains the basic
fields of a timeline entity plus events. However, entity#info map is not
stored at all.
2 event#timestamp is also not persisted.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500


[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643110#comment-14643110
 ] 

Anubhav Dhoot commented on YARN-3957:
-

Thanks [~kasha] for review and commit

 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643204#comment-14643204
 ] 

Naganarasimha G R commented on YARN-3963:
-

this approach should be fine !

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643215#comment-14643215
 ] 

Wangda Tan commented on YARN-3873:
--

[~sunilg], thanks for working on this, looked at main logic of the patch, 
haven't looked at tests yet. Two major comments:
1) It's no need to deprecate getApplicationComparator, it's not a public API, 
we should simply remove it.
2) Can we assume pendingApplication's ordering policy is equals to 
activeApplication's ordering policy? I think we can assume this at least for 
now. For example, fair/priority/fifo. This assumption can avoid configure for 
pending-application-ordering-policy. And we don't need to create a 
getActiveIterator in OrderingPolicy, we can reuse getAssignmentIterator.

Thoughts?

 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
 0003-YARN-3873.patch, 0004-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643261#comment-14643261
 ] 

Naganarasimha G R commented on YARN-3963:
-

Also can we check for all labels and print one message ?

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 
 0003-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery


 [ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3971:
---
Attachment: 0004-YARN-3971.patch

[~leftnoteasy] Thank you for review. I agree with you. Updated patch as per you 
comments and also testcase updated.

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
 0003-YARN-3971.patch, 0004-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643283#comment-14643283
 ] 

Hadoop QA commented on YARN-3963:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 55s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 20s | Tests passed in 
hadoop-yarn-common. |
| | |  49m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747398/0003-YARN-3963.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f36835f |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8679/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8679/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8679/console |


This message was automatically generated.

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 
 0003-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

[
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643188#comment-14643188
]

Vrushali C commented on YARN-3908:
--

Hi Zhijie

Thanks.. that is a good point. But if we put the event timestamp first, we have
no way of querying for a particular event key unless we know the exact
timestamp. I think knowing the exact time is probably almost impossible.

Imagine that there is another event that occurs between the two kill events, so
it has a timestamp kill1 and kill2. Now we still have to fetch all those
and filter them out. So placing the timestamp first does not help in this case.
But if we have the event key first, the columns will be placed together and the
event timestamps will be stored in a chronological order (using the long.max -
ts value). So the first one being fetched for kill event would be the latest
for that event key.

thanks
Vrushali

Bugs in HBaseTimelineWriterImpl
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

[
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643267#comment-14643267
]

Zhijie Shen commented on YARN-3908:
---

Okay, it's fair point. It seems that the key design significantly depends on
how we want to operate on the events. The current key design is most friendly
to check if there exists the events who match the given event ID to match some
given info key (and its value). But if you want to fetch everything that
belongs to this event (our query needs to do this, as it's implicitly an atomic
unit for now), it seems to be inevitable to scan through all these columns that
have the given event ID (correct me if I'm wrong :-). If so, there seems to to
have little gain from this key design, while complicating the event
encapsulation logic.

And after rethinking of the current query to support (YARN-3051), I want to
amend my suggestion. It seems to be more reasonable to use
{{e!eventTimestamp?eventId?eventInfoKey}}, such that we can natively scan
through the events of one entity one-by-one return them in a chronological
order.

Bugs in HBaseTimelineWriterImpl
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3963) AddNodeLabel on duplicate label addition shows success


 [ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3963:
---
Attachment: 0003-YARN-3963.patch

[~leftnoteasy] and [~Naganarasimha] Thanks for review and comments.
Have updated the patch as per the comments. Only when parameters are changed 
the exception is thrown now .

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 
 0003-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643180#comment-14643180
 ] 

Wangda Tan commented on YARN-3963:
--

Thanks comment, [~Naganarasimha]. I think we can throw exception only if added 
node label has different attribute. If node label has same name and attribute, 
it should be simply ignored.

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3852) Add docker container support to container-executor


[ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643217#comment-14643217
 ] 

Varun Vasudev commented on YARN-3852:
-

Committed to trunk and branch-2. Thanks [~ashahab]!

 Add docker container support to container-executor 
 ---

 Key: YARN-3852
 URL: https://issues.apache.org/jira/browse/YARN-3852
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Abin Shahab
 Fix For: 2.8.0

 Attachments: YARN-3852-1.patch, YARN-3852-2.patch, YARN-3852-3.patch, 
 YARN-3852.patch


 For security reasons, we need to ensure that access to the docker daemon and 
 the ability to run docker containers is restricted to privileged users ( i.e 
 users running applications should not have direct access to docker). In order 
 to ensure the node manager can run docker commands, we need to add docker 
 support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3852) Add docker container support to container-executor


[ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643225#comment-14643225
 ] 

Hudson commented on YARN-3852:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8227 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8227/])
YARN-3852. Add docker container support to container-executor. Contributed by 
Abin Shahab. (vvasudev: rev f36835ff9b878fa20fe58a30f9d1e8c47702d6d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h


 Add docker container support to container-executor 
 ---

 Key: YARN-3852
 URL: https://issues.apache.org/jira/browse/YARN-3852
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Abin Shahab
 Fix For: 2.8.0

 Attachments: YARN-3852-1.patch, YARN-3852-2.patch, YARN-3852-3.patch, 
 YARN-3852.patch


 For security reasons, we need to ensure that access to the docker daemon and 
 the ability to run docker containers is restricted to privileged users ( i.e 
 users running applications should not have direct access to docker). In order 
 to ensure the node manager can run docker commands, we need to add docker 
 support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643246#comment-14643246
 ] 

Naganarasimha G R commented on YARN-3963:
-

Hi [~bibinchundatt], 
??throw new IOException(Label= + label.getName() + ( + 
rmNodeLabel.getIsExclusive() + ) +  already added);??
message would be better as 
{{throw new IOException(Exclusivity cannot be modified for an existing label : 
 + label.getName() + ( + rmNodeLabel.getIsExclusive() + ) );}}

In test case ??Assert.fail(IOException not thrown should have on adding same 
labels);??
I think better message would be {{IOException is expected when exlusivity is 
modified}}

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 
 0003-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3963) AddNodeLabel on duplicate label addition shows success


 [ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3963:
---
Attachment: 0004-YARN-3963.patch

[~Naganarasimha] Thank you for your review comments.
[~leftnoteasy] Updated patch handling all comments.

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 
 0003-YARN-3963.patch, 0004-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643328#comment-14643328
 ] 

Anubhav Dhoot commented on YARN-3736:
-

bq. It may be better to store ReservationId object rather than the name as we 
are parsing the name back in most places
Since the reservationId is used as a key/path where the 
ReservationAllocationState is stored, we need to convert the reservationId to 
string format. We are not actually storing the reservationId as part of the 
protobuf payload anymore. 

 Persist the Plan information, ie. accepted reservations to the RMStateStore 
 for failover
 

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery


[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643431#comment-14643431
 ] 

Hadoop QA commented on YARN-3971:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 26s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 19s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 31s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747403/0004-YARN-3971.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f36835f |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8680/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8680/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8680/console |


This message was automatically generated.

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
 0003-YARN-3971.patch, 0004-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643446#comment-14643446
 ] 

Vrushali C commented on YARN-3908:
--

Yes, let's get the current patch in and continue discussion on the event schema

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover


 [ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3736:

Attachment: YARN-3736.002.patch

Removed ReservationId from ReservationAllocationStateProto
updated LevelDBRMStateStore to use batch writes.

 Persist the Plan information, ie. accepted reservations to the RMStateStore 
 for failover
 

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643314#comment-14643314
 ] 

Anubhav Dhoot commented on YARN-3736:
-

Thanks [~subru] for the review.
Rebased the patch add addressed all feedback except the ZKRMstateStore
For deleting reservation, in the first step we remove the reservation and then 
we check if the plan state is empty because of this removal. If so we remove 
the plan state as well. Because of this two steps we cannot use a transaction 
for both. So we either use 2 different transactions or leave it as is. I have 
left it as is for now and let me know if you feel we still need to use 2 
SafeTransactions or do something else.




 Persist the Plan information, ie. accepted reservations to the RMStateStore 
 for failover
 

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3851) Add support for container runtimes in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana resolved YARN-3851.
-
  Resolution: Fixed
Release Note: Support for this was added as part of YARN-3853 . It wasn't 
straightforward (or very useful) to split the patches up, so a single patch was 
submitted. 

 Add support for container runtimes in YARN 
 ---

 Key: YARN-3851
 URL: https://issues.apache.org/jira/browse/YARN-3851
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana

 We need the ability to support different container types within the same 
 executor. Container runtimes are lower-level implementations for supporting 
 specific container engines (e.g docker). These are meant to be independent of 
 executors themselves - a given executor (e.g LinuxContainerExecutor) could 
 potentially switch between different container runtimes depending on what a 
 client/application is requesting. An executor continues to provide higher 
 level functionality that could be specific to an operating system - for 
 example, LinuxContainerExecutor continues to handle cgroups, users, 
 diagnostic events etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2884) Proxying all AM-RM communications

2015-07-27 Thread Kishore Chaliparambil (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil updated YARN-2884:

Attachment: YARN-2884-V2.patch

Attached a new patch that addresses review comments from [~subru]

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic

[
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643448#comment-14643448
]

Wangda Tan commented on YARN-3983:
--

Start working on this JIRA, will upload patch for review shortly.

Make CapacityScheduler to easier extend application allocation logic

Key: YARN-3983
URL: https://issues.apache.org/jira/browse/YARN-3983
Project: Hadoop YARN
Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan

While working on YARN-1651 (resource allocation for increasing container), I
found it is very hard to extend existing CapacityScheduler resource
allocation logic to support different types of resource allocation.
For example, there's a lot of differences between increasing a container and
allocating a container:
- Increasing a container doesn't need to check locality delay.
- Increasing a container doesn't need to build/modify a resource request tree
(ANY-RACK/HOST).
- Increasing a container doesn't need to check allocation/reservation
starvation (see {{shouldAllocOrReserveNewContainer}}).
- After increasing a container is approved by scheduler, it need to update an
existing container token instead of creating new container.
And there're lots of similarities when allocating different types of
resources.
- User-limit/queue-limit will be enforced for both of them.
- Both of them needs resource reservation logic. (Maybe continuous
reservation looking is needed for both of them).
The purpose of this JIRA is to make easier extending CapacityScheduler
resource allocation logic to support different types of resource allocation,
make common code reusable, and also better code organization.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643470#comment-14643470
 ] 

Hadoop QA commented on YARN-2884:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747414/YARN-2884-V2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3e6fce9 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8684/console |


This message was automatically generated.

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-07-27 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643347#comment-14643347
]

Junping Du commented on YARN-3045:
--

bq. Well was aware that priority was not to differentiate the containers but
for the events of it, but i thought you mentioned for the purpose of better
querying rather than the purpose of writing it.
Better query is one of purpose but writing them in different policies could
also be a consideration here. We may not afford to flush every events in a
large scale cluster, so we may choose to ignore/cache some unimportant ones.

bq. I have not gone through the writer code completely but is there any caching
which you want to flush if the event priority is high ? Also was thinking
whether we need to change the Writer/Collector API to mention the criticality
of the event being published?
We already have a new flush() API now for writer that checked in YARN-3949.
Please refer some of discussions there with details. You are right that we are
lacking of API to respect this priority/policy in the whole data flow for
writing. I will file another JIRA to track this.

bq. So from NM side we want to publish events for ApplicationEntity and
ContainerEntity, but based on the title of this jira i thought scope of this
jira is to handle only ContainerEntities from NM side, is it better to handle
events related Application entities specific to a given NM in another Jira? but
i can try to ensure required foundation is done in NM side in this JIRA as part
of your other comments, Thoughts?
I am fine with separating events other than container events to a separated
JIRA if it is really necessary. In common case, jira title shouldn't bound the
implementation as at JIRA proposing time, there is no so concrete goal like
when JIRA is being implemented so we can fix/adjust later. Anyway, I would
support the scope (container events + foundation work) you proposed here in
case you are comfortable with.

bq. Also event has just id but NM related Application events will have the same
event ID in different NM's so would it be something like
INIT_APPLICATION_NODE_ID ?
That's a good question. My initative thinking is we could need something like
NodemanagerEntity to store application events, resource localizaiton event, log
aggregation handling events, configuration, etc. However, I would like to hear
you and other guys' ideas on this as well.

bq. +1 for this thought, had the same initial hitch as in future if we add more
events than unnecessary create event and methods in publisher, but for the
initial version thought will have approach similar to RM and ATSV1. But i feel
better to handle now than refactor later on. But i can think of couple of
approaches here.
Yes. All three approaches seems to work here. IMO, the 2nd approach (hook to
existing event dispatcher) looks simpler and straightforward.

bq. Was not clear about the comment, IIRC Zhijjie in the meeting also mentioned
that i am handling removing threaded model of publishing container metrics
statistics as part of this jira. May be i am missing some other jira which you
are already working on, may be can you englighten me about it?
I was thinking you are encapsulating metrics with TimelineEvent but actually
not. So no worry on my previous comments on this.

[Event producers] Implement NM writing container lifecycle events to ATS

Key: YARN-3045
URL: https://issues.apache.org/jira/browse/YARN-3045
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Attachments: YARN-3045-YARN-2928.002.patch,
YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch,
YARN-3045.20150420-1.patch

Per design in YARN-2928, implement NM writing container lifecycle events and
container system metrics to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3982) container-executor parsing of container-executor.cfg broken


 [ 
https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3982:

Attachment: YARN-3982.001.patch

Patch with fix attached.

 container-executor parsing of container-executor.cfg broken
 ---

 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: YARN-3982.001.patch


 After YARN-2194, the container-executor parsing of container-executor.cfg is 
 broken. The test-container-executor binary is also failing and has been 
 failing for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

[
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643388#comment-14643388
]

Li Lu commented on YARN-3816:
-

bq. I'm still very confused by the usage of the word aggregate. In this
patch, aggregate really means accumulating values of a metric along the time
dimension, which is completely different than the notion of aggregation we have
used all along. The aggregation has always been about rolling up values from
children to parents.

I have a similar concern with regard to the dimensions of aggregations, too.
If I understand the problem correctly, we have two dimensions in a flow/user
level aggregation: one dimension for all entities belong to this flow/user,
another dimension for time. If we aggregate in the flow/user dimension, one
typical problem we will hit is aligning times. Suppose entity E1 and E2 both
belong to flow F1. In an aggregation, we would like to aggregate E1 and E2.
However, if a metric M is a time series, how do we align the times in E1.M and
E2.M? Normally the two time series may have slightly different sample times, so
I believe we need to decide the semantic on this situation?

[Aggregation] App-level Aggregation for YARN system metrics
---

Key: YARN-3816
URL: https://issues.apache.org/jira/browse/YARN-3816
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Attachments: Application Level Aggregation of Timeline Data.pdf,
YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch

We need application level aggregation of Timeline data:
- To present end user aggregated states for each application, include:
resource (CPU, Memory) consumption across all containers, number of
containers launched/completed/failed, etc. We need this for apps while they
are running as well as when they are done.
- Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be
aggregated to show details of states in framework level.
- Other level (Flow/User/Queue) aggregation can be more efficient to be based
on Application-level aggregations rather than raw entity-level data as much
less raws need to scan (with filter out non-aggregated entities, like:
events, configurations, etc.).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3982:

Summary: container-executor parsing of container-executor.cfg broken in 
trunk and branch-2  (was: container-executor parsing of container-executor.cfg 
broken)

 container-executor parsing of container-executor.cfg broken in trunk and 
 branch-2
 -

 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: YARN-3982.001.patch


 After YARN-2194, the container-executor parsing of container-executor.cfg is 
 broken. The test-container-executor binary is also failing and has been 
 failing for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643421#comment-14643421
 ] 

Li Lu commented on YARN-3908:
-

Just a quick check about the current status of this JIRA. Are we still planning 
to merge it in ASAP, or we want to fix the row key of timeline events with one 
more draft, or we plan to fully resolve timeline event problems before we merge 
it in (if fixing the row key does not fully resolve the problem)? I'd like to 
know our plan on this JIRA so that I can fine tune my patch for YARN-3904 
accordingly. Thanks! 

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell

2015-07-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643452#comment-14643452
 ] 

Jason Lowe commented on YARN-3950:
--

+1 lgtm.  Will commit tomorrow if there are no objections.

 Add unique SHELL_ID environment variable to DistributedShell
 

 Key: YARN-3950
 URL: https://issues.apache.org/jira/browse/YARN-3950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-3950.001.patch, YARN-3950.002.patch


 As discussed in [this 
 comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027],
  it would be useful to have a monotonically increasing and independent ID of 
 some kind that is unique per shell in the distributed shell program.
 We can do that by adding a SHELL_ID env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3982) container-executor parsing of container-executor.cfg broken

Varun Vasudev created YARN-3982:
---

 Summary: container-executor parsing of container-executor.cfg 
broken
 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker


After YARN-2194, the container-executor parsing of container-executor.cfg is 
broken. The test-container-executor binary is also failing and has been failing 
for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3851) Add support for container runtimes in YARN

2015-07-27 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643382#comment-14643382
 ] 

Allen Wittenauer commented on YARN-3851:


What is a user supposed to do with that release note?

 Add support for container runtimes in YARN 
 ---

 Key: YARN-3851
 URL: https://issues.apache.org/jira/browse/YARN-3851
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana

 We need the ability to support different container types within the same 
 executor. Container runtimes are lower-level implementations for supporting 
 specific container engines (e.g docker). These are meant to be independent of 
 executors themselves - a given executor (e.g LinuxContainerExecutor) could 
 potentially switch between different container runtimes depending on what a 
 client/application is requesting. An executor continues to provide higher 
 level functionality that could be specific to an operating system - for 
 example, LinuxContainerExecutor continues to handle cgroups, users, 
 diagnostic events etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3851) Add support for container runtimes in YARN


[ 
https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643396#comment-14643396
 ] 

Sidharta Seethana commented on YARN-3851:
-

hi [~aw] ,

I apologize - that wasn't meant to be a release note, just a comment that the 
patch for this is included in YARN-3853 . I'll remove the release note.

thanks,
-Sidharta




 Add support for container runtimes in YARN 
 ---

 Key: YARN-3851
 URL: https://issues.apache.org/jira/browse/YARN-3851
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana

 We need the ability to support different container types within the same 
 executor. Container runtimes are lower-level implementations for supporting 
 specific container engines (e.g docker). These are meant to be independent of 
 executors themselves - a given executor (e.g LinuxContainerExecutor) could 
 potentially switch between different container runtimes depending on what a 
 client/application is requesting. An executor continues to provide higher 
 level functionality that could be specific to an operating system - for 
 example, LinuxContainerExecutor continues to handle cgroups, users, 
 diagnostic events etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3851) Add support for container runtimes in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3851:

Release Note:   (was: Support for this was added as part of YARN-3853 . It 
wasn't straightforward (or very useful) to split the patches up, so a single 
patch was submitted. )

 Add support for container runtimes in YARN 
 ---

 Key: YARN-3851
 URL: https://issues.apache.org/jira/browse/YARN-3851
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana

 We need the ability to support different container types within the same 
 executor. Container runtimes are lower-level implementations for supporting 
 specific container engines (e.g docker). These are meant to be independent of 
 executors themselves - a given executor (e.g LinuxContainerExecutor) could 
 potentially switch between different container runtimes depending on what a 
 client/application is requesting. An executor continues to provide higher 
 level functionality that could be specific to an operating system - for 
 example, LinuxContainerExecutor continues to handle cgroups, users, 
 diagnostic events etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success


[ 
https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643400#comment-14643400
 ] 

Hadoop QA commented on YARN-3963:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 56s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  42m 20s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747408/0004-YARN-3963.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f36835f |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8681/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8681/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8681/console |


This message was automatically generated.

 AddNodeLabel on duplicate label addition shows success 
 ---

 Key: YARN-3963
 URL: https://issues.apache.org/jira/browse/YARN-3963
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 
 0003-YARN-3963.patch, 0004-YARN-3963.patch


 Currently as per the code in 
 {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel 
 again event will not be fired so no updation is done. 
  
 {noformat}
 ./yarn rmadmin –addToClusterNodeLabels x
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)”
 ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)”
  {noformat}
 All these commands will give success when applied again through CLI 
  
 {code}
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=true]
 2015-07-22 21:16:57,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService   
   RESULT=SUCCESS
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
 [z:exclusivity=false]
 2015-07-22 21:17:06,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService 
 RESULT=SUCCESS
  {code}
 Also since exclusive=true to false is not supported success is misleading



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3851) Add support for container runtimes in YARN


[ 
https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643398#comment-14643398
 ] 

Sidharta Seethana commented on YARN-3851:
-

Moved note from the release note : Support for this was added as part of 
YARN-3853 . It wasn't straightforward (or very useful) to split the patches up, 
so a single patch was submitted.

 Add support for container runtimes in YARN 
 ---

 Key: YARN-3851
 URL: https://issues.apache.org/jira/browse/YARN-3851
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana

 We need the ability to support different container types within the same 
 executor. Container runtimes are lower-level implementations for supporting 
 specific container engines (e.g docker). These are meant to be independent of 
 executors themselves - a given executor (e.g LinuxContainerExecutor) could 
 potentially switch between different container runtimes depending on what a 
 client/application is requesting. An executor continues to provide higher 
 level functionality that could be specific to an operating system - for 
 example, LinuxContainerExecutor continues to handle cgroups, users, 
 diagnostic events etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic

Wangda Tan created YARN-3983:


 Summary: Make CapacityScheduler to easier extend application 
allocation logic
 Key: YARN-3983
 URL: https://issues.apache.org/jira/browse/YARN-3983
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan


While working on YARN-1651 (resource allocation for increasing container), I 
found it is very hard to extend existing CapacityScheduler resource allocation 
logic to support different types of resource allocation.

For example, there's a lot of differences between increasing a container and 
allocating a container:
- Increasing a container doesn't need to check locality delay.
- Increasing a container doesn't need to build/modify a resource request tree 
(ANY-RACK/HOST).
- Increasing a container doesn't need to check allocation/reservation 
starvation (see {{shouldAllocOrReserveNewContainer}}).
- After increasing a container is approved by scheduler, it need to update an 
existing container token instead of creating new container.

And there're lots of similarities when allocating different types of resources.
- User-limit/queue-limit will be enforced for both of them.
- Both of them needs resource reservation logic. (Maybe continuous reservation 
looking is needed for both of them).

The purpose of this JIRA is to make easier extending CapacityScheduler resource 
allocation logic to support different types of resource allocation, make common 
code reusable, and also better code organization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3853) Add docker container runtime support to LinuxContainterExecutor


[ 
https://issues.apache.org/jira/browse/YARN-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643444#comment-14643444
 ] 

Hudson commented on YARN-3853:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8228 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8228/])
YARN-3853. Add docker container runtime support to LinuxContainterExecutor. 
Contributed by Sidharta Seethana. (vvasudev: rev 
3e6fce91a471b4a5099de109582e7c6417e8a822)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntime.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DelegatingLinuxContainerRuntime.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommand.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/LinuxContainerRuntimeConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperationExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/LinuxContainerRuntime.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperationException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
*

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643476#comment-14643476
 ] 

Zhijie Shen commented on YARN-3908:
---

Sure, as most folks are comfortable with the latest patch, let's get this in. 
I'll file a separate jira to track the discussion about event column key.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

[
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643337#comment-14643337
]

Sangjin Lee commented on YARN-3816:
---

Thanks [~djp] for updating the POC patch and providing answers to the questions
I had. I've looked over the new patch and also gone through your answers. Some
follow-up thoughts and observations are below.

(1)
I think there is some confusion on the types of metrics in relation to this.
Here is how I look at the metric types. See if it squares with your
understanding. There are basically *2 independent* dimensions of metric types:
- single value vs. time series
- counter vs. gauge

Single value vs. time series purely concerns *storage*. It only determines
whether only the latest value is stored or the entire time series values are
stored (subject to TTL).

On the other hand, the counter vs. gauge dimension deals with *what type of
mathematical functions/operations* apply to them. Counters are metrics that are
time-cumulative in their nature, and are always monotonically increasing with
time (e.g. HDFS bytes written). Gauges can fluctuate up and down over time
(e.g. CPU usage). The time integral that's being done in this patch applies
only to gauges. It does not make sense for counters.

These are two independent dimensions in principle. For example, a gauge can be
a single value. A counter can be a time series. Regardless of whether they are
always useful, they are possible in principle.

I propose to introduce the second dimension to the metrics explicitly. This
second dimension nearly maps to toAggregate (and/or the REP/SUM distinction)
in your patch. But I think it's probably better to introduce the metric types
explicitly as another enum or by subclassing {{TimelineMetric}}. Let me know
what you think.

(2)
I'm still very confused by the usage of the word aggregate. In this patch,
aggregate really means accumulating values of a metric along the time
dimension, which is completely different than the notion of aggregation we have
used all along. The aggregation has always been about rolling up values from
children to parents. Can we choose a different word to describe this aspect of
accumulating values along the time dimension, and avoid using aggregation for
this? Accumulate? Cumulative? Any suggestion?

On a related note,
{quote}
However, in practice, there are cases that some aggregated metrics has both
properties, like area value here - we do need its cumulative values and also
could be interested in getting values within a given time interval. Isn't it?
{quote}
My statement was that a time-integral (or accumulation along the time
dimension) does not make sense for counters. For example, consider HDFS bytes
written. The time accumulation is already built into it (see (1)). If you
further accumulate this along the time dimension, it becomes quadratic (doubly
integrated) in time. I don't see how that can be useful. Another way to see
this is that a counter is basically a time integral of another gauge. For
example, the HDFS bytes written counter (in the unit of bytes) is a time
integral of HDFS bytes written per time (in the unit of bytes/sec). If I
misunderstood what you meant, could you kindly clarify it?

(3)
{quote}
No. Nothing get changed on the design since our last discussions. The average
and max is also important but I just haven't get bandwidth to add in poc stage
as adding existing things could be more straight-forward. I will add it later.
{quote}
The average/max we discussed in the offline discussion is actually very similar
to the aggregated (accumulated) metrics here. The only difference is that the
average is further divided by the duration. Otherwise, it's basically the same
derived property. It would be good to do one or the other, but not both. I
would suggest that we do only one of them. I think it would be OK to do this
and not the average/max of the previous discussion. I'd like to hear what
others think about this.

(4)
Can we introduce a configuration that disables this time accumulation feature?
As we discussed, some may not want to have this feature enabled and are
perfectly happy with simple aggregation (from children to parents). It would be
good to isolate this part and be able to enable/disable it.

[Aggregation] App-level Aggregation for YARN system metrics
---

We need application level aggregation of Timeline data:
- To present end user

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643424#comment-14643424
 ] 

Sangjin Lee commented on YARN-3908:
---

+1 for committing this patch and having the event schema discussion in a 
different JIRA.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

[
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643463#comment-14643463
]

Vrushali C commented on YARN-3816:
--

bq. If I understand the problem correctly, we have two dimensions in a
flow/user level aggregation: one dimension for all entities belong to this
flow/user, another dimension for time.

Ah not quite. Time dimension goes with flow/user/queue. For example, we will
aggregate for user level stats over a time period like daily or weekly.
Similarly for flows. Flows are aggregated over one day or one week in hRaven.
Ditto for users and queues. So let's say, for simplicity, user1 ran a wordcount
map reduce job three times on Monday and a sleep job two times on monday. Now
daily aggregation table for user1 will have sum of each metric which is a
counter on that day, that is

{code}
M1 for user1 on monday = M1 from wordcount.Run1 on monday + M1 from
wordcount.Run2 on monday + M1 from wordcount.Run3 on monday + M1 from
sleep.run1 on monday + M1 from sleep.run2 on monday.

{code}

Now, for flows on monday:

{code}
M1 for wordcount on monday = M1 from wordcount.run1 on monday + M1 from
wordcount.run2 on monday + M1 from wordcount.Run3 on monday
M1 for sleep on monday = M1 from sleep.run1 on monday + M1 from sleep.run2 on
monday
{code}

For timeseries, we need to decide what aggregation means. One option is that we
could normalize the values to a minute level granularity. For example, add up
values per min across each time. So anything that occurred within a minute will
be assigned to the top of that minute: eg if something happening at 2 min 10
seconds is considered to have occurred at 2 min. That way we can sum up across
flows/users/runs etc.

[Aggregation] App-level Aggregation for YARN system metrics
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3908) Bugs in HBaseTimelineWriterImpl


 [ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3908.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: YARN-2928

Committed the patch to branch YARN-2928. Thanks for the patch, Vrushali and 
Sangjin, as well as other folks for contributing your thoughts.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928

 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2

2015-07-27 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643602#comment-14643602
 ] 

Xuan Gong commented on YARN-3982:
-

This doc is confusing.
{code}
/**
 * Function to return an array of values for a key.
 * Value delimiter is assumed to be a '%'.
 */
char ** get_values(const char * key) {
{code}
Which value delimiter are really used in here ? % or , ? 

 container-executor parsing of container-executor.cfg broken in trunk and 
 branch-2
 -

 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: YARN-3982.001.patch


 After YARN-2194, the container-executor parsing of container-executor.cfg is 
 broken. The test-container-executor binary is also failing and has been 
 failing for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643492#comment-14643492
 ] 

Hadoop QA commented on YARN-3982:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | yarn tests |   6m  5s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  21m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747417/YARN-3982.001.patch |
| Optional Tests | javac unit |
| git revision | trunk / 3e6fce9 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8683/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8683/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8683/console |


This message was automatically generated.

 container-executor parsing of container-executor.cfg broken in trunk and 
 branch-2
 -

 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: YARN-3982.001.patch


 After YARN-2194, the container-executor parsing of container-executor.cfg is 
 broken. The test-container-executor binary is also failing and has been 
 failing for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover

2015-07-27 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643606#comment-14643606
 ] 

Subru Krishnan commented on YARN-3736:
--

Thanks [~adhoot] for responding to my comments, what you say makes sense.

The latest patch LGTM. Looks like there are minor test-patch issues (unused 
imports after the update etc). Can you address those. Also can you open a JIRA 
for storing/updating reservation state in the RMStateStore from the Plan.



 Persist the Plan information, ie. accepted reservations to the RMStateStore 
 for failover
 

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.004.patch

Update the 004 version of the patch. This patch addresses the following two
major issues:
# Rebuild the current Phoenix writer into an offline aggregation writer.
Specifically, the writer writes info and metric data into the newly created
Phoenix offline aggregation table.
# Simplify writer interface by using TimelineCollectorContext. In this way both
normal writers and offline aggregation writers can use the same interface to
write data.

One thing pending discussion is about the {{aggregation}} method. I feel this
method is a little bit outdated. Could anyone remind me the assumed use case
for it? Will it fit for real-time aggregations only?

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

Key: YARN-3904
URL: https://issues.apache.org/jira/browse/YARN-3904
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
Attachments: YARN-3904-YARN-2928.001.patch,
YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch,
YARN-3904-YARN-2928.004.patch

After we finished the design for time-based aggregation, we can adopt our
existing Phoenix storage into the storage of the aggregated data. In this
JIRA, I'm proposing to refactor writers to add support to aggregation
writers. Offline aggregation writers typically has less contextual
information. We can distinguish these writers by special naming. We can also
use CollectorContexts to model all contextual information and use it in our
writer interfaces.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3984) Rethink event column key issue

Zhijie Shen created YARN-3984:
-

 Summary: Rethink event column key issue
 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
 Fix For: YARN-2928


Currently, the event column key is event_id?info_key?timestamp, which is not so 
friendly to fetching all the events of an entity and sorting them in a 
chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
schema. I open this jira to continue the discussion about it which was 
commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3982:

Attachment: YARN-3982.002.patch

bq. Which value delimiter are really used in here ? % or , ? 

Sorry about the goof up in the comment. It should be ,. Uploaded a new patch 
with the fix.

 container-executor parsing of container-executor.cfg broken in trunk and 
 branch-2
 -

 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: YARN-3982.001.patch, YARN-3982.002.patch


 After YARN-2194, the container-executor parsing of container-executor.cfg is 
 broken. The test-container-executor binary is also failing and has been 
 failing for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2884) Proxying all AM-RM communications

2015-07-27 Thread Kishore Chaliparambil (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil updated YARN-2884:

Attachment: YARN-2884-V3.patch

Uploading a new version of the patch

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
 YARN-2884-V3.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

[
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643504#comment-14643504
]

Li Lu commented on YARN-3816:
-

Thanks for the clarification [~vrushalic]! Yes the problem is with time series
metrics. I think your approach works here, but maybe we'd like to change the
scale of round-ups according to the scale of the aggregation? For example, if
we aggregate the data for one whole day, we can merge the data in the same
minute. If we aggregate the data in a week, maybe we can merge the data in the
same hour?

[Aggregation] App-level Aggregation for YARN system metrics
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643528#comment-14643528
 ] 

Varun Vasudev commented on YARN-3982:
-

Test failure is unrelated to the patch.

 container-executor parsing of container-executor.cfg broken in trunk and 
 branch-2
 -

 Key: YARN-3982
 URL: https://issues.apache.org/jira/browse/YARN-3982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: YARN-3982.001.patch


 After YARN-2194, the container-executor parsing of container-executor.cfg is 
 broken. The test-container-executor binary is also failing and has been 
 failing for quite a while. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643565#comment-14643565
 ] 

Hadoop QA commented on YARN-3736:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  5 
new checkstyle issues (total was 96, now 100). |
| {color:green}+1{color} | whitespace |   1m 20s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 32s | The patch appears to introduce 4 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m 42s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 39s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747409/YARN-3736.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3e6fce9 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8682/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8682/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8682/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8682/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8682/console |


This message was automatically generated.

 Persist the Plan information, ie. accepted reservations to the RMStateStore 
 for failover
 

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels

2015-07-27 Thread Dheeren Beborrtha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643650#comment-14643650
 ] 

Dheeren Beborrtha commented on YARN-2918:
-

This is a major issue and a big inconvenience. Can this be backported to Hadoop 
2.6.0?

 Don't fail RM if queue's configured labels are not existed in 
 cluster-node-labels
 -

 Key: YARN-2918
 URL: https://issues.apache.org/jira/browse/YARN-2918
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Wangda Tan
 Fix For: 2.8.0, 2.7.1

 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch


 Currently, if admin setup labels on queues 
 {{queue-path.accessible-node-labels = ...}}. And the label is not added to 
 RM, queue's initialization will fail and RM will fail too:
 {noformat}
 2014-12-03 20:11:50,126 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 ...
 Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
 please check.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 {noformat}
 This is not a good user experience, we should stop fail RM so that admin can 
 configure queue/labels in following steps:
 - Configure queue (with label)
 - Start RM
 - Add labels to RM
 - Submit applications
 Now admin has to:
 - Configure queue (without label)
 - Start RM
 - Add labels to RM
 - Refresh queue's config (with label)
 - Submit applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working


[ 
https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643660#comment-14643660
 ] 

Hudson commented on YARN-3846:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8229 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8229/])
YARN-3846. RM Web UI queue filter is not working for sub queue. Contributed by 
Mohammad Shahid Khan (jianhe: rev 3572ebd738aa5fa8b0906d75fb12cc6cbb991573)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt


 RM Web UI queue filter is not working
 -

 Key: YARN-3846
 URL: https://issues.apache.org/jira/browse/YARN-3846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0, 2.8.0
Reporter: Mohammad Shahid Khan
Assignee: Mohammad Shahid Khan
  Labels: PatchAvailable
 Fix For: 2.8.0

 Attachments: YARN-3846.patch, scheduler queue issue.png, scheduler 
 queue positive behavior.png


 Click on root queue will show the complete applications
 But click on the leaf queue is not filtering the application related to the 
 the clicked queue.
 The regular expression seems to be wrong 
 {code}
 q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';,
 {code}
 For example
 1. Suppose  queue name is  b
 them the above expression will try to substr at index 1 
 q.lastIndexOf(':')  = -1
 -1+2= 1
 which is wrong. its should look at the 0 index.
 2. if queue name is ab.x
 then it will parse it to .x 
 but it should be x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643666#comment-14643666
 ] 

Hadoop QA commented on YARN-3904:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 20s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 48s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747454/YARN-3904-YARN-2928.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / df0ec47 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8687/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8687/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8687/console |


This message was automatically generated.

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643701#comment-14643701
 ] 

Jian He commented on YARN-3887:
---

thanks Sunil !  some comments on the patch:
- Do you plan to do client side changes as part of this jira ?
- RMAppUpdatePriorityEvent - RMApp may receive this event at many other states 
other than RUNNING state. In that case, the state-machine will throw 
InvalidEventException. I think we do not need to send event to RMApp, all it 
does is just get the application submission context and set the priority. This 
can be done at clientRMService. Similarly, the event to state-store can be sent 
directly from clientRMService.
- CapacityScheuduler#updateApplicationPriority does not need to be synchronized 
?


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery


[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643736#comment-14643736
 ] 

Bibin A Chundatt commented on YARN-3971:


Test case failure is not related to this patch

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
 0003-YARN-3971.patch, 0004-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS


[ 
https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643740#comment-14643740
 ] 

Hadoop QA commented on YARN-3978:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 41s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 48s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | checkstyle |   2m  4s | The applied patch generated  2 
new checkstyle issues (total was 39, now 41). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m  6s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  52m 35s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 24s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747458/YARN-3978.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3e6fce9 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8688/console |


This message was automatically generated.

 Configurably turn off the saving of container info in Generic AHS
 -

 Key: YARN-3978
 URL: https://issues.apache.org/jira/browse/YARN-3978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver, yarn
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-3978.001.patch


 Depending on how each application's metadata is stored, one week's worth of 
 data stored in the Generic Application History Server's database can grow to 
 be almost a terabyte of local disk space. In order to alleviate this, I 
 suggest that there is a need for a configuration option to turn off saving of 
 non-AM container metadata in the GAHS data store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3984) Rethink event column key issue


 [ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C reassigned YARN-3984:


Assignee: Vrushali C

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643749#comment-14643749
 ] 

Hadoop QA commented on YARN-3736:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   1m 42s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 30s | The patch appears to introduce 4 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m 24s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m  9s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747459/YARN-3736.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3572ebd |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8689/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8689/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8689/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8689/console |


This message was automatically generated.

 Add RMStateStore apis to store and load accepted reservations for failover
 --

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch, YARN-3736.003.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

[
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643758#comment-14643758
]

Vrushali C commented on YARN-3984:
--

I can take this up. Please feel free to reassign or if someone else wants it,
please let me know on the jira and we can redistribute the jira.

To add to my previous comment, let's take an example. Say event id is KILLED
and it occurs 3 times for whatever reason. Now let's say:
at ts1, for key DIAGNOSTICS, the value is xyz.
at ts1, for key SOMETHING ELSE, the value is something
at ts2, for key DIAGNOSTICS, the value is abc
at ts3, for key DIAGNOSTICS, the value is pqr
at ts3, for key SOMETHING ELSE, the value is something even more

where ts1 ts2 ts3. So ts3 is the most recent timestamp.

Now which of the queries is the most commonly required:
- for this application, what is the diagnostic message for the most recent
KILLED event id? Or all of the diagnostics in KILLED id?
- for this application, what is the most recent key(s) in the KILLED event id ?
- for this application, what are the keys ( values) that occurred between ts2
and ts3 for KILLED event id?

If we think #2 and #3 are the most commonly run queries, then we can go with
timestamp before the key.
If we think #1 is the most commonly run query, then we can go with key before
timestamp.

Now if we choose timestamp before key, then we can never pull back the value
given an event and a key without fetching all keys in that event for all
timestamps.

If we choose key before timestamp, we cant easily pull back most recently
occurred key within an event.

In any case, we can't know which event was the most recent in the application.
For example, in this case, INITED event record will be stored before KILLED
event record since I K and hbase will sort it lexicographically.

So we are interested in knowing which event itself occurred the most recent,
then we need to fetch all events (along with event keys and timestamps) and
sort by timestamp and then return the most recent event.

Rethink event column key issue
--

Key: YARN-3984
URL: https://issues.apache.org/jira/browse/YARN-3984
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
Fix For: YARN-2928

Currently, the event column key is event_id?info_key?timestamp, which is not
so friendly to fetching all the events of an entity and sorting them in a
chronologic order. IMHO, timestamp?event_id?info_key may be a better key
schema. I open this jira to continue the discussion about it which was
commented on YARN-3908.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working

2015-07-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643632#comment-14643632
 ] 

Jian He commented on YARN-3846:
---

patch looks good to me , committing 

 RM Web UI queue filter is not working
 -

 Key: YARN-3846
 URL: https://issues.apache.org/jira/browse/YARN-3846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0, 2.8.0
Reporter: Mohammad Shahid Khan
Assignee: Mohammad Shahid Khan
  Labels: PatchAvailable
 Attachments: YARN-3846.patch, scheduler queue issue.png, scheduler 
 queue positive behavior.png


 Click on root queue will show the complete applications
 But click on the leaf queue is not filtering the application related to the 
 the clicked queue.
 The regular expression seems to be wrong 
 {code}
 q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';,
 {code}
 For example
 1. Suppose  queue name is  b
 them the above expression will try to substr at index 1 
 q.lastIndexOf(':')  = -1
 -1+2= 1
 which is wrong. its should look at the 0 index.
 2. if queue name is ab.x
 then it will parse it to .x 
 but it should be x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels


 [ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2918:
-
Labels: 2.6.1-candidate  (was: )

 Don't fail RM if queue's configured labels are not existed in 
 cluster-node-labels
 -

 Key: YARN-2918
 URL: https://issues.apache.org/jira/browse/YARN-2918
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Wangda Tan
  Labels: 2.6.1-candidate
 Fix For: 2.8.0, 2.7.1

 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch


 Currently, if admin setup labels on queues 
 {{queue-path.accessible-node-labels = ...}}. And the label is not added to 
 RM, queue's initialization will fail and RM will fail too:
 {noformat}
 2014-12-03 20:11:50,126 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 ...
 Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
 please check.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 {noformat}
 This is not a good user experience, we should stop fail RM so that admin can 
 configure queue/labels in following steps:
 - Configure queue (with label)
 - Start RM
 - Add labels to RM
 - Submit applications
 Now admin has to:
 - Configure queue (without label)
 - Start RM
 - Add labels to RM
 - Refresh queue's config (with label)
 - Submit applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643727#comment-14643727
 ] 

Hadoop QA commented on YARN-2884:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 22s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 25s | The applied patch generated  2 
new checkstyle issues (total was 237, now 238). |
| {color:red}-1{color} | checkstyle |   3m 11s | The applied patch generated  2 
new checkstyle issues (total was 0, now 2). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 51s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m 19s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  52m 22s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 113m  9s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747453/YARN-2884-V3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3e6fce9 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8686/console |


This message was automatically generated.

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
 YARN-2884-V3.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics