date:20150527


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560607#comment-14560607
 ] 

Varun Vasudev commented on YARN-3678:
-

[~zhiguohong] thanks for the detailed explanation! When you say your fix 
reduced the rate to nearly zero, do you know why the accidental kill continued 
to happen?

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3719) Improve Solaris support in YARN


[ 
https://issues.apache.org/jira/browse/YARN-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560760#comment-14560760
 ] 

Alan Burlison commented on YARN-3719:
-

It appears that Solaris will have the same setsid-related issue as BSD:
YARN-3066 Hadoop leaves orphaned tasks running after job is killed

 Improve Solaris support in YARN
 ---

 Key: YARN-3719
 URL: https://issues.apache.org/jira/browse/YARN-3719
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: build
Affects Versions: 2.7.0
 Environment: Solaris x86, Solaris sparc
Reporter: Alan Burlison

 At present the YARN native components aren't fully supported on Solaris 
 primarily due to differences between Linux and Solaris. This top-level task 
 will be used to group together both existing and new issues related to this 
 work. A second goal is to improve YARN performance and functionality on 
 Solaris wherever possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression


[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560808#comment-14560808
 ] 

Hudson commented on YARN-3686:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #940 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/940/])
YARN-3686. CapacityScheduler should trim default_node_label_expression. (Sunil 
G via wangda) (wangda: rev cdbd66be111c93c85a409d47284e588c453ecae9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


 CapacityScheduler should trim default_node_label_expression
 ---

 Key: YARN-3686
 URL: https://issues.apache.org/jira/browse/YARN-3686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch, 
 0003-YARN-3686.patch, 0004-YARN-3686.patch


 We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes


[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560812#comment-14560812
 ] 

Hudson commented on YARN-3632:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #940 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/940/])
YARN-3632. Ordering policy should be allowed to reorder an application when 
demand changes. Contributed by Craig Welch (jianhe: rev 
10732d515f62258309f98e4d7d23249f80b1847d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FairOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java


 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560810#comment-14560810
 ] 

Hudson commented on YARN-160:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #940 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/940/])
YARN-160. Enhanced NodeManager to automatically obtain cpu/memory values from 
underlying OS when configured to do so. Contributed by Varun Vasudev. (vinodkv: 
rev 500a1d9c76ec612b4e737888f4be79951c11591d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-tools/hadoop-gridmix/src/test/java/org/apache/hadoop/mapred/gridmix/DummyResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java


 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 YARN-160.007.patch, YARN-160.008.patch, apache-yarn-160.0.patch, 
 apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3714) AM proxy filter can not get proper default proxy address if RM-HA is enabled

2015-05-27 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-3714:
---
Attachment: YARN-3714.001.patch

Attached patch make WebAppUtils#getProxyHostsAndPortsForAmFilter to get RM 
webapp addresses from {{yarn.resourcemanager.hostname._rm-id_}} and default 
port number if {{yarn.resourcemanager.webapp.(https.)address._rm-id_}} are not 
set.

 AM proxy filter can not get proper default proxy address if RM-HA is enabled
 

 Key: YARN-3714
 URL: https://issues.apache.org/jira/browse/YARN-3714
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-3714.001.patch


 Default proxy address could not be got without setting 
 {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
 {{yarn.resourcemanager.webapp.https.address._rm-id_}} explicitly if RM-HA is 
 enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560735#comment-14560735
 ] 

Varun Saxena commented on YARN-3678:


Yeah that's why I said if we can increase value of {{pid_max}} on a 64-bit 
machine to highest value it can take i.e. 2^22, that should mitigate the risk 
of this happening. But anyways, as I mentioned above, we can fix this though.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3714) AM proxy filter can not get proper default proxy address if RM-HA is enabled


[ 
https://issues.apache.org/jira/browse/YARN-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560658#comment-14560658
 ] 

Hadoop QA commented on YARN-3714:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 15s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 58s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-server-web-proxy. |
| | |  39m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735545/YARN-3714.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cdbd66b |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8097/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8097/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-web-proxy test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8097/artifact/patchprocess/testrun_hadoop-yarn-server-web-proxy.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8097/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8097/console |


This message was automatically generated.

 AM proxy filter can not get proper default proxy address if RM-HA is enabled
 

 Key: YARN-3714
 URL: https://issues.apache.org/jira/browse/YARN-3714
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-3714.001.patch


 Default proxy address could not be got without setting 
 {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
 {{yarn.resourcemanager.webapp.https.address._rm-id_}} explicitly if RM-HA is 
 enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-05-27 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-41:
--
Attachment: YARN-41-8.patch

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-05-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560776#comment-14560776
 ] 

Weiwei Yang commented on YARN-1042:
---

I am thinking about following approach, appreciate for suggestions : )

In ApplicationSubmissionContext class, add a new argument to indicate the 
container allocation rule in terms of affinity/anti-affinity. RM will follow 
the certain rules to allocate containers for this application. The argument is 
an instance of class ContainerAllocationRule(new), this class defines several 
types allocation rule, such as 

 * AFFINITY_REQUIRED: containers MUST be allocated on the same host/rack
 * AFFINITY_PREFERED: prefer to allocate containers on same host/rack if 
possible
 * ANTI_AFFINITY_REQUIRED: containers MUST be allocated on different hosts/racks
 * ANTI_AFFINITY_PREFERED: prefer to allocate containers on different 
hosts/racks if possible

Each of these rules will have a handler on the RM side to add some control on 
container allocation. When a client submits an application with a certain 
ContainerAllocationRule to RM, this information will be added into 
ApplicationAttemptId (because the allocation rule is defined per application), 
when RM uses registered scheduler to allocate containers, it can retrieve the 
rule from ApplicationAttemptId and call particular handler during the 
allocation. The code can be added into 
SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens so to avoid 
modifying all schedulers.

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3712) ContainersLauncher: handle event CLEANUP_CONTAINER asynchronously

2015-05-27 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560679#comment-14560679
 ] 

Jun Gong commented on YARN-3712:


[~vinodkv] 

Our case: NM receives a event SHUTDOWN, and starts to clean up containers. If 
doing it synchronously and cleaning up takes a little long time, some 
containers might not be killed  cleaned up, then corresponding launching 
container process  ContainersLauncher #.. will not exit until container 
finishes. It will result problem likes YARN-3585, NM hang.

 ContainersLauncher: handle event CLEANUP_CONTAINER asynchronously
 -

 Key: YARN-3712
 URL: https://issues.apache.org/jira/browse/YARN-3712
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3712.01.patch, YARN-3712.02.patch


 It will save some time by handling event CLEANUP_CONTAINER asynchronously. 
 This improvement will be useful for cases that cleaning up container cost a 
 little long time(e.g. for our case: we are running Docker container on NM, it 
 will take above 1 seconds to clean up one docker container.  ) and many 
 containers to clean up(e.g. NM need clean up all running containers when NM 
 shutdown). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

[
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560762#comment-14560762
]

Alan Burlison commented on YARN-3066:
-

As Linux, OSX, Solaris and BSD all support the setsid(2) syscall and it's part
of POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm), isn't a
better solution just to wrap setsid() + exec() in a little bit of JNI? That
would avoid the need to install external executables.

Hadoop leaves orphaned tasks running after job is killed

Key: YARN-3066
URL: https://issues.apache.org/jira/browse/YARN-3066
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
Reporter: Dmitry Sivachenko

When spawning user task, node manager checks for setsid(1) utility and spawns
task program via it. See
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
for instance:
String exec = Shell.isSetsidAvailable? exec setsid : exec;
FreeBSD, unlike Linux, does not have setsid(1) utility. So plain exec is
used to spawn user task. If that task spawns other external programs (this
is common case if a task program is a shell script) and user kills job via
mapred job -kill Job, these child processes remain running.
1) Why do you silently ignore the absence of setsid(1) and spawn task process
via exec: this is the guarantee to have orphaned processes when job is
prematurely killed.
2) FreeBSD has a replacement third-party program called ssid (which does
almost the same as Linux's setsid). It would be nice to detect which binary
is present during configure stage and put @SETSID@ macros into java file to
use the correct name.
I propose to make Shell.isSetsidAvailable test more strict and fail to start
if it is not found: at least we will know about the problem at start rather
than guess why there are orphaned tasks running forever.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560774#comment-14560774
 ] 

Rohith commented on YARN-3535:
--

Thanks [~peng.zhang] for working on this issue..  
Some comments
# I think the method {{recoverResourceRequestForContainer}} should be 
synchronized, any thought?
# Why do we require {{RMContextImpl.java}} changes? I think this we can avoid, 
not necessarily required.

Tests : 
# Any specific reason for chaning {{TestAMRestart.java}}?
# IIUC, this issue can occur in all the scheduler given AM-RM heart beat is 
lesser than NM-RM heart beat interval. So can it include FT test case that 
applicable for both CS and FS. May it you can add test in the extending class 
{{ParameterizedSchedulerTestBase}} i.e TestAbstractYarnScheduler.


  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
  Labels: BB2015-05-TBR
 Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
 yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression


[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560824#comment-14560824
 ] 

Hudson commented on YARN-3686:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #210 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/210/])
YARN-3686. CapacityScheduler should trim default_node_label_expression. (Sunil 
G via wangda) (wangda: rev cdbd66be111c93c85a409d47284e588c453ecae9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java


 CapacityScheduler should trim default_node_label_expression
 ---

 Key: YARN-3686
 URL: https://issues.apache.org/jira/browse/YARN-3686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch, 
 0003-YARN-3686.patch, 0004-YARN-3686.patch


 We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3712) ContainersLauncher: handle event CLEANUP_CONTAINER asynchronously

2015-05-27 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560632#comment-14560632
 ] 

Jun Gong commented on YARN-3712:


[~sidharta-s] [~ashahab] Thanks for the suggestion. 

I am referring to both cleaning the docker image and container instance. With 
adding a feature that restarts stopped container, we have modified 
DockerContainerExecutor, and seperated docker run ... -rm to docker run -d 
and docker rm $CONTAINER_NAME. docker rm takes above 1 seconds. 

 ContainersLauncher: handle event CLEANUP_CONTAINER asynchronously
 -

 Key: YARN-3712
 URL: https://issues.apache.org/jira/browse/YARN-3712
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3712.01.patch, YARN-3712.02.patch


 It will save some time by handling event CLEANUP_CONTAINER asynchronously. 
 This improvement will be useful for cases that cleaning up container cost a 
 little long time(e.g. for our case: we are running Docker container on NM, it 
 will take above 1 seconds to clean up one docker container.  ) and many 
 containers to clean up(e.g. NM need clean up all running containers when NM 
 shutdown). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3627) Preemption not triggered in Fair scheduler when maxResources is set on parent queue

2015-05-27 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-3627.

Resolution: Not A Problem

Closing this issue as per comments

 Preemption not triggered in Fair scheduler when maxResources is set on parent 
 queue
 ---

 Key: YARN-3627
 URL: https://issues.apache.org/jira/browse/YARN-3627
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, scheduler
 Environment: Suse 11 SP3, 2 NM 
Reporter: Bibin A Chundatt

 Consider the below scenario of fair configuration 
  
 Root (10Gb cluster resource)
 --Q1 (maxResources  4gb) 
 Q1.1 (maxResources 4gb) 
 Q1.2  (maxResources  4gb) 
 --Q2 (maxResources 6GB)
  
 No applications are running in Q2
  
 Submit one application with to Q1.1 with 50 maps   4Gb gets allocated to Q1.1
 Now submit application to  Q1.2 the same will be starving for memory always.
  
 Preemption will never get triggered since 
 yarn.scheduler.fair.preemption.cluster-utilization-threshold =.8 and the 
 cluster utilization is below .8.
  
 *Fairscheduler.java*
 {code}
   private boolean shouldAttemptPreemption() {
 if (preemptionEnabled) {
   return (preemptionUtilizationThreshold  Math.max(
   (float) rootMetrics.getAllocatedMB() / clusterResource.getMemory(),
   (float) rootMetrics.getAllocatedVirtualCores() /
   clusterResource.getVirtualCores()));
 }
 return false;
   }
 {code}
 Are we supposed to configure in running cluster maxResources  0mb and 0 
 cores  so that all queues can take full cluster resources always if 
 available??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560651#comment-14560651
 ] 

Naganarasimha G R commented on YARN-3678:
-

Hi [~vvasudev]  [~zhiguohong], For us it happened in secure setup and one key 
point is both the NM user and user of the container is same . But irrespective 
of this it could have killed any other process[container] for same/another app 
running in the same node, submitted by the same user. One suggestion(crude fix 
not sure how to get it working for other OS) is can we grep for the containerID 
and confirm its the same process we are targetting and then kill it  ? 

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560676#comment-14560676
 ] 

Varun Vasudev commented on YARN-3678:
-

[~zhiguohong] - sorry my question was - after applying your fix, the problem 
should have gone away. However, you said - With this fix, the accident rate 
is reduced from several times per day to nearly zero. Do you know why it still 
happened?

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560630#comment-14560630
 ] 

Varun Saxena commented on YARN-3678:


Secure.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560643#comment-14560643
 ] 

Varun Saxena commented on YARN-3678:


As [~zhiguohong] mentioned, even in our case same user is used for NM and 
app-submitter.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560748#comment-14560748
 ] 

Hong Zhiguo commented on YARN-3678:
---

the event sequence:
call SEND SIGTERM  -  pid recycle   -  call SEND SIGKILL  - check 
process live time(based on current time)

The time between [call SEND SIGTERM] and [call SEND SIGKILL] is 250ms
The time between [pid recycle] and [check process live time] may be shorter or 
longer than 250ms. When it's longer than 250ms, there's chance we make false 
positive judgement.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3718) hadoop-yarn-server-nodemanager's use of Linux Cgroups is non-portable


[ 
https://issues.apache.org/jira/browse/YARN-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560766#comment-14560766
 ] 

Alan Burlison commented on YARN-3718:
-

As far as I can tell, the solution on BSD is just to disable all resource 
management features at compile-time. Whilst that approach should probably be 
taken on initially Solaris, if it makes sense to use RM features on Linux it 
almost certainly does on Solaris as well. To do that requires taking a close 
look at how the Linux Cgroup features are currently used and if necessary 
abstracting that functionality so it can be implemented using both Linux and 
Solaris RM functionality.

 hadoop-yarn-server-nodemanager's use of Linux Cgroups is non-portable
 -

 Key: YARN-3718
 URL: https://issues.apache.org/jira/browse/YARN-3718
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: BSD OSX Solaris Windows Linux
Reporter: Alan Burlison

 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
  makes use of the Linux-only Cgroups feature 
 (http://en.wikipedia.org/wiki/Cgroups) when Hadoop is built on Linux, but 
 there is no corresponding functionality for non-Linux platforms.
 Other platforms provide similar functionality, e.g. Solaris has an extensive 
 range of resource management features 
 (http://docs.oracle.com/cd/E23824_01/html/821-1460/index.html). Work is 
 needed to abstract the resource management features of Yarn so that the same 
 facilities for resource management can be provided on all platforms that 
 provide the requisite functionality,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560615#comment-14560615
 ] 

Varun Saxena commented on YARN-3678:


I think if we increase the value of {{pid_max}}, issue is unlikely to occur.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3627) Preemption not triggered in Fair scheduler when maxResources is set on parent queue

2015-05-27 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560636#comment-14560636
 ] 

Bibin A Chundatt commented on YARN-3627:


[~sunilg] . Thnk you for looking into the issue.

 Preemption not triggered in Fair scheduler when maxResources is set on parent 
 queue
 ---

 Key: YARN-3627
 URL: https://issues.apache.org/jira/browse/YARN-3627
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, scheduler
 Environment: Suse 11 SP3, 2 NM 
Reporter: Bibin A Chundatt

 Consider the below scenario of fair configuration 
  
 Root (10Gb cluster resource)
 --Q1 (maxResources  4gb) 
 Q1.1 (maxResources 4gb) 
 Q1.2  (maxResources  4gb) 
 --Q2 (maxResources 6GB)
  
 No applications are running in Q2
  
 Submit one application with to Q1.1 with 50 maps   4Gb gets allocated to Q1.1
 Now submit application to  Q1.2 the same will be starving for memory always.
  
 Preemption will never get triggered since 
 yarn.scheduler.fair.preemption.cluster-utilization-threshold =.8 and the 
 cluster utilization is below .8.
  
 *Fairscheduler.java*
 {code}
   private boolean shouldAttemptPreemption() {
 if (preemptionEnabled) {
   return (preemptionUtilizationThreshold  Math.max(
   (float) rootMetrics.getAllocatedMB() / clusterResource.getMemory(),
   (float) rootMetrics.getAllocatedVirtualCores() /
   clusterResource.getVirtualCores()));
 }
 return false;
   }
 {code}
 Are we supposed to configure in running cluster maxResources  0mb and 0 
 cores  so that all queues can take full cluster resources always if 
 available??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.


[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560817#comment-14560817
 ] 

Hadoop QA commented on YARN-41:
---

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 24s | The applied patch generated  1 
new checkstyle issues (total was 14, now 12). |
| {color:green}+1{color} | whitespace |   0m 34s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m  9s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m  8s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  50m 11s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 53s | Tests passed in 
hadoop-yarn-server-tests. |
| | | 102m 10s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bb18163 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8098/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8098/console |


This message was automatically generated.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-05-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560948#comment-14560948
 ] 

Jason Lowe commented on YARN-3585:
--

Do you have the shutdown logs from the NM that hung?  It seems very likely that 
somehow we did not close the leveldb state store cleanly, if you're seeing a 
leveldb non-daemon thread holding up the JVM shutdown.

 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Priority: Critical

 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3690) 'mvn site' fails on JDK8

2015-05-27 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560976#comment-14560976
 ] 

Brahma Reddy Battula commented on YARN-3690:


[~ajisakaa] Attached the patch.Kindly Review.

 'mvn site' fails on JDK8
 

 Key: YARN-3690
 URL: https://issues.apache.org/jira/browse/YARN-3690
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api, site
 Environment: CentOS 7.0, Oracle JDK 8u45.
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3690-patch


 'mvn site' failed by the following error:
 {noformat}
 [ERROR] 
 /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18:
  error: package org.apache.hadoop.yarn.factories has already been annotated
 [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN })
 [ERROR] ^
 [ERROR] java.lang.AssertionError
 [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126)
 [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45)
 [ERROR] at 
 com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161)
 [ERROR] at 
 com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
 [ERROR] at 
 com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
 [ERROR] at 
 com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
 [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
 [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
 [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
 [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
 [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
 [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
 [ERROR] at 
 com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
 [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
 [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219)
 [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205)
 [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64)
 [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54)
 [ERROR] javadoc: error - fatal error
 [ERROR] 
 [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc 
 -J-Xmx1024m @options @packages
 [ERROR] 
 [ERROR] Refer to the generated Javadoc files in 
 '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir.
 [ERROR] - [Help 1]
 [ERROR] 
 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
 switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR] 
 [ERROR] For more information about the errors and possible solutions, please 
 read the following articles:
 [ERROR] [Help 1] 
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3690) 'mvn site' fails on JDK8

2015-05-27 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3690:
---
Attachment: YARN-3690-patch

 'mvn site' fails on JDK8
 

 Key: YARN-3690
 URL: https://issues.apache.org/jira/browse/YARN-3690
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api, site
 Environment: CentOS 7.0, Oracle JDK 8u45.
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3690-patch


 'mvn site' failed by the following error:
 {noformat}
 [ERROR] 
 /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18:
  error: package org.apache.hadoop.yarn.factories has already been annotated
 [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN })
 [ERROR] ^
 [ERROR] java.lang.AssertionError
 [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126)
 [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45)
 [ERROR] at 
 com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161)
 [ERROR] at 
 com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
 [ERROR] at 
 com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
 [ERROR] at 
 com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
 [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
 [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
 [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
 [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
 [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
 [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
 [ERROR] at 
 com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
 [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
 [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219)
 [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205)
 [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64)
 [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54)
 [ERROR] javadoc: error - fatal error
 [ERROR] 
 [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc 
 -J-Xmx1024m @options @packages
 [ERROR] 
 [ERROR] Refer to the generated Javadoc files in 
 '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir.
 [ERROR] - [Help 1]
 [ERROR] 
 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
 switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR] 
 [ERROR] For more information about the errors and possible solutions, please 
 read the following articles:
 [ERROR] [Help 1] 
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3690) 'mvn site' fails on JDK8

2015-05-27 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3690:
---
Component/s: (was: documentation)
 site
 api

 'mvn site' fails on JDK8
 

 Key: YARN-3690
 URL: https://issues.apache.org/jira/browse/YARN-3690
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api, site
 Environment: CentOS 7.0, Oracle JDK 8u45.
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula

 'mvn site' failed by the following error:
 {noformat}
 [ERROR] 
 /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18:
  error: package org.apache.hadoop.yarn.factories has already been annotated
 [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN })
 [ERROR] ^
 [ERROR] java.lang.AssertionError
 [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126)
 [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45)
 [ERROR] at 
 com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161)
 [ERROR] at 
 com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
 [ERROR] at 
 com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
 [ERROR] at 
 com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
 [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
 [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
 [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
 [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
 [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
 [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
 [ERROR] at 
 com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
 [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
 [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219)
 [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205)
 [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64)
 [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54)
 [ERROR] javadoc: error - fatal error
 [ERROR] 
 [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc 
 -J-Xmx1024m @options @packages
 [ERROR] 
 [ERROR] Refer to the generated Javadoc files in 
 '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir.
 [ERROR] - [Help 1]
 [ERROR] 
 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
 switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR] 
 [ERROR] For more information about the errors and possible solutions, please 
 read the following articles:
 [ERROR] [Help 1] 
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-05-27 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560853#comment-14560853
 ] 

Devaraj K commented on YARN-41:
---

{code:xml}
-1  checkstyle  2m 24s  The applied patch generated 1 new checkstyle 
issues (total was 14, now 12).
{code}
{code:xml}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java:0:
 Missing package-info.java file.
{code}
This checkstyle issue doesn't seem to be directly related to 
UnRegisterNodeManagerResponse.java. I have added another class 
UnRegisterNodeManagerRequest.java in the same package which doesn't show up any 
checkstyle and also locally I don't get any checkstyle error for this class.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
 YARN-41-8.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes


[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560945#comment-14560945
 ] 

Hudson commented on YARN-3632:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/208/])
YARN-3632. Ordering policy should be allowed to reorder an application when 
demand changes. Contributed by Craig Welch (jianhe: rev 
10732d515f62258309f98e4d7d23249f80b1847d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FairOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3679) Add documentation for timeline server filter ordering

2015-05-27 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560959#comment-14560959
 ] 

Mit Desai commented on YARN-3679:
-

[~zjshen]/[~jeagles] did you guys get a chance to take a look at this?

 Add documentation for timeline server filter ordering
 -

 Key: YARN-3679
 URL: https://issues.apache.org/jira/browse/YARN-3679
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-3679.patch


 Currently the auth filter is before static user filter by default. After 
 YARN-3624, the filter order is no longer reversed. So the pseudo auth's 
 allowing anonymous config is useless with both filters loaded in the new 
 order, because static user will be created before presenting it to auth 
 filter. The user can remove static user filter from the config to get 
 anonymous user work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression


[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560941#comment-14560941
 ] 

Hudson commented on YARN-3686:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/208/])
YARN-3686. CapacityScheduler should trim default_node_label_expression. (Sunil 
G via wangda) (wangda: rev cdbd66be111c93c85a409d47284e588c453ecae9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java
* hadoop-yarn-project/CHANGES.txt


 CapacityScheduler should trim default_node_label_expression
 ---

 Key: YARN-3686
 URL: https://issues.apache.org/jira/browse/YARN-3686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch, 
 0003-YARN-3686.patch, 0004-YARN-3686.patch


 We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560943#comment-14560943
 ] 

Hudson commented on YARN-160:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #208 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/208/])
YARN-160. Enhanced NodeManager to automatically obtain cpu/memory values from 
underlying OS when configured to do so. Contributed by Varun Vasudev. (vinodkv: 
rev 500a1d9c76ec612b4e737888f4be79951c11591d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-tools/hadoop-gridmix/src/test/java/org/apache/hadoop/mapred/gridmix/DummyResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java


 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 YARN-160.007.patch, YARN-160.008.patch, apache-yarn-160.0.patch, 
 apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560965#comment-14560965
 ] 

Rohith commented on YARN-3585:
--

I have attached NM logs and thread dump in YARN-3640. Would get it from 
YARN-3640?

 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Priority: Critical

 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-05-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561014#comment-14561014
 ] 

Jason Lowe commented on YARN-3585:
--

Ah, my apologies.  I didn't realize it is failing with the exact same logs, 
even after YARN-3641.  Could you to instrument logs in the state store code to 
verify the leveldb database is indeed being closed even when it hangs?  Trying 
to determine if this is a bug in Hadoop code or a bug in the leveldb code.

 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Priority: Critical

 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560891#comment-14560891
 ] 

Hadoop QA commented on YARN-3678:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735592/YARN-3678.patch |
| Optional Tests |  |
| git revision | trunk / bb18163 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8099/console |


This message was automatically generated.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical
 Attachments: YARN-3678.patch


 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gu-chi updated YARN-3678:
-
Attachment: YARN-3678.patch

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical
 Attachments: YARN-3678.patch


 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


 [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3051:
---
Attachment: YARN-3051-YARN-2928.003.patch

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
 YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-05-27 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561391#comment-14561391
 ] 

Joep Rottinghuis commented on YARN-3706:


Initially I was stuck on YARN-3721, but I have my environment setup properly 
now.
I'll work on sanitizing the patch and upload a new version. I don't expect the 
overall structure and approach to significantly change.
The updated patch will have deletions and renames from existing classes 
included (and may therefore be somewhat harder to read).

 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor
 Attachments: YARN-3706-YARN-2928.001.patch


 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561393#comment-14561393
 ] 

Hadoop QA commented on YARN-3051:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735644/YARN-3051-YARN-2928.003.patch
 |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / e19566a |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8100/console |


This message was automatically generated.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
 YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3569) YarnClient.getAllQueues returns a list of queues that do not display running apps.

2015-05-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561433#comment-14561433
 ] 

Jian He commented on YARN-3569:
---

[~spandan], what is your use case ? If you want to get applications for the 
given queues, you can use below API in YarnClient.
{code}
  public abstract ListApplicationReport getApplications(SetString queues,
  SetString users, SetString applicationTypes,
  EnumSetYarnApplicationState applicationStates)
{code}

 YarnClient.getAllQueues returns a list of queues that do not display running 
 apps.
 --

 Key: YARN-3569
 URL: https://issues.apache.org/jira/browse/YARN-3569
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.8.0
Reporter: Spandan Dutta
Assignee: Spandan Dutta
 Attachments: YARN-3569.patch


 YarnClient.getAllQueues() returns a list of queues. If we pick a queue from 
 this list and call getApplications on it, we always get an empty list 
 even-though applications are running on that queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI

2015-05-27 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3581:

Attachment: YARN-3581.20150528-1.patch

Hi [~wangda], updated a patch with your earlier review comments fixed but 
please check my previous comment and also confirm if you require patch for 
2.7.1 branch (as per offline discussion you wanted fix in 2.7.1, so that it 
will be used and later deleted  in 2.8)?

 Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
 -

 Key: YARN-3581
 URL: https://issues.apache.org/jira/browse/YARN-3581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch


 In 2.6.0, we added an option called -directlyAccessNodeLabelStore to make 
 RM can start with label-configured queue settings. After YARN-2918, we don't 
 need this option any more, admin can configure queue setting, start RM and 
 configure node label via RMAdminCLI without any error.
 In addition, this option is very restrictive, first it needs to run on the 
 same node where RM is running if admin configured to store labels in local 
 disk.
 Second, when admin run the option when RM is running, multiple process write 
 to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance


[ 
https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561210#comment-14561210
 ] 

Varun Vasudev commented on YARN-3652:
-

My apologies for the delay [~xinxianyin]. We do need a SchedulerMetrics class. 
The general idea is that SchedulerHealth should pick up values from the 
SchedulerMetrics class but that the SchedulerMetrics class should ideally 
provide more information. As an example, the SchedulerHealth cares about the 
number of reserved containers, which the SchedulerMetrics class should provide. 
Ideally, though, the SchedulerMetrics class would also give me some extra 
information such as the mean, the distribution and the variance of the the 
number of reserved containers.

I think purely for the purposes of YARN-3630, you should use modify the 
SchedulerHealth class to expose the number of waiting events, but we can 
independently work on a SchedulerMetrics class as well.

 A SchedulerMetrics may be need for evaluating the scheduler's performance
 -

 Key: YARN-3652
 URL: https://issues.apache.org/jira/browse/YARN-3652
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Xianyin Xin

 As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating 
 the scheduler's performance. The performance indexes includes #events waiting 
 for being handled by scheduler, the throughput, the scheduling delay and/or 
 other indicators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression


[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561137#comment-14561137
 ] 

Hudson commented on YARN-3686:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2156 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2156/])
YARN-3686. CapacityScheduler should trim default_node_label_expression. (Sunil 
G via wangda) (wangda: rev cdbd66be111c93c85a409d47284e588c453ecae9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java


 CapacityScheduler should trim default_node_label_expression
 ---

 Key: YARN-3686
 URL: https://issues.apache.org/jira/browse/YARN-3686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch, 
 0003-YARN-3686.patch, 0004-YARN-3686.patch


 We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle

2015-05-27 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561190#comment-14561190
]

Sangjin Lee commented on YARN-3721:
---

[~gtCarrera9], as you point out, Failure to find
org.apache.hadoop:hadoop-yarn-server-timelineservice:jar:3.0.0-SNAPSHOT is
caused by the cycle in the dependencies. I just want to see whether excluding
mini-cluster from the hbase-testing-util module is the correct fix. For that to
be the case, none of our unit tests should depend on the mini-cluster module.

A follow-up question is, if that is the case, then what do we need from
hbase-testing-util? It seems like HBaseTestingUtility (used in
TestHBaseTimelineWriterImpl) is provided by hbase-server:test (which is pulled
in indirectly by the phoenix dependency?). Then what are we getting from
hbase-testing-util that we need? [~swagle]? If we can isolate the thing we need
from hbase-testing-util dependencies, then we could possibly remove
hbase-testing-util from the dependencies and use that instead. I'm wondering
out loud. I suppose it all depends on what we actually use from
hbase-testing-util and its dependencies.

[~vrushalic], could you take a look at the unit test failure Li mentioned? Is
that independent of this issue? Thanks!

build is broken on YARN-2928 branch due to possible dependency cycle

Key: YARN-3721
URL: https://issues.apache.org/jira/browse/YARN-3721
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
Priority: Blocker
Attachments: YARN-3721-YARN-2928.001.patch

The build is broken on the YARN-2928 branch at the
hadoop-yarn-server-timelineservice module. It's been broken for a while, but
we didn't notice it because the build happens to work despite this if the
maven local cache is not cleared.
To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven
local cache and build it.
Almost certainly it was introduced by YARN-3529.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3723) Need to clearly document primaryFilter and otherInfo value type

Zhijie Shen created YARN-3723:
-

 Summary: Need to clearly document primaryFilter and otherInfo 
value type
 Key: YARN-3723
 URL: https://issues.apache.org/jira/browse/YARN-3723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561134#comment-14561134
 ] 

MENG DING commented on YARN-1197:
-

Correct a typo in my previous post, it should be:
bq. As an example, if a container is currently using 2G, and AM asks to 
increase its resource to 4G, and then asks again to increase to 6G, but AM 
doesn't actually use any of the token to increase the resource on NM. In this 
case, with the current design, RM can only revert the resource allocation back 
to 4G after expiration, not 2G.

Forgot to discuss another important piece. We probably should not use the 
existing ResourceCalculator to compare two resource capabilities in this 
project, because:
- The DefaultResourceCalculator only compares memory, which won't work if we 
want to only change CPU cores.
- The DominantResourceCalculator may end up comparing different dimensions 
between two Resources, which doesn't make sense in our project.

The way to compare two resource in this project should be straightforward as 
follows. Let me know if you think otherwise.
- For increase request, no dimension in the target resource can be smaller than 
the corresponding dimension in the current resource, and at least one dimension 
in the target resource must be larger than the corresponding dimension in the 
current resource.
- For decrease request, no dimension in the target resource can be larger than 
the corresponding dimension in the current resource, and at least one dimension 
in the target resource must be smaller than the corresponding dimension in the 
current resource. 



 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes


[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561141#comment-14561141
 ] 

Hudson commented on YARN-3632:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2156 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2156/])
YARN-3632. Ordering policy should be allowed to reorder an application when 
demand changes. Contributed by Craig Welch (jianhe: rev 
10732d515f62258309f98e4d7d23249f80b1847d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FairOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java


 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561139#comment-14561139
 ] 

Hudson commented on YARN-160:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2156 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2156/])
YARN-160. Enhanced NodeManager to automatically obtain cpu/memory values from 
underlying OS when configured to do so. Contributed by Varun Vasudev. (vinodkv: 
rev 500a1d9c76ec612b4e737888f4be79951c11591d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-tools/hadoop-gridmix/src/test/java/org/apache/hadoop/mapred/gridmix/DummyResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 YARN-160.007.patch, YARN-160.008.patch, apache-yarn-160.0.patch, 
 apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle

2015-05-27 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561218#comment-14561218
 ] 

Vrushali C commented on YARN-3721:
--

bq. Vrushali C, could you take a look at the unit test failure Li mentioned? Is 
that independent of this issue? Thanks!

Yes, looking into this now. 

 build is broken on YARN-2928 branch due to possible dependency cycle
 

 Key: YARN-3721
 URL: https://issues.apache.org/jira/browse/YARN-3721
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
Priority: Blocker
 Attachments: YARN-3721-YARN-2928.001.patch


 The build is broken on the YARN-2928 branch at the 
 hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
 we didn't notice it because the build happens to work despite this if the 
 maven local cache is not cleared.
 To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
 local cache and build it.
 Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561059#comment-14561059
 ] 

Hudson commented on YARN-160:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #2138 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2138/])
YARN-160. Enhanced NodeManager to automatically obtain cpu/memory values from 
underlying OS when configured to do so. Contributed by Varun Vasudev. (vinodkv: 
rev 500a1d9c76ec612b4e737888f4be79951c11591d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-tools/hadoop-gridmix/src/test/java/org/apache/hadoop/mapred/gridmix/DummyResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java


 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 YARN-160.007.patch, YARN-160.008.patch, apache-yarn-160.0.patch, 
 apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes


[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561061#comment-14561061
 ] 

Hudson commented on YARN-3632:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2138 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2138/])
YARN-3632. Ordering policy should be allowed to reorder an application when 
demand changes. Contributed by Craig Welch (jianhe: rev 
10732d515f62258309f98e4d7d23249f80b1847d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FairOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler

2015-05-27 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3558:
---
Attachment: rm.log
Amlog.txt

[~sunilg]. Attaching Rm log and AM log

 Additional containers getting reserved from RM in case of Fair scheduler
 

 Key: YARN-3558
 URL: https://issues.apache.org/jira/browse/YARN-3558
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.7.0
 Environment: OS :Suse 11 Sp3
 Setup : 2 RM 2 NM
 Scheduler : Fair scheduler
Reporter: Bibin A Chundatt
 Attachments: Amlog.txt, rm.log


 Submit PI job with 16 maps
 Total container expected : 16 MAPS + 1 Reduce  + 1 AM
 Total containers reserved by RM is 21
 Below set of containers are not being used for execution
 container_1430213948957_0001_01_20
 container_1430213948957_0001_01_19
 RM Containers reservation and states
 {code}
  Processing container_1430213948957_0001_01_01 of type START
  Processing container_1430213948957_0001_01_01 of type ACQUIRED
  Processing container_1430213948957_0001_01_01 of type LAUNCHED
  Processing container_1430213948957_0001_01_02 of type START
  Processing container_1430213948957_0001_01_03 of type START
  Processing container_1430213948957_0001_01_02 of type ACQUIRED
  Processing container_1430213948957_0001_01_03 of type ACQUIRED
  Processing container_1430213948957_0001_01_04 of type START
  Processing container_1430213948957_0001_01_05 of type START
  Processing container_1430213948957_0001_01_04 of type ACQUIRED
  Processing container_1430213948957_0001_01_05 of type ACQUIRED
  Processing container_1430213948957_0001_01_02 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type LAUNCHED
  Processing container_1430213948957_0001_01_06 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type START
  Processing container_1430213948957_0001_01_07 of type ACQUIRED
  Processing container_1430213948957_0001_01_07 of type LAUNCHED
  Processing container_1430213948957_0001_01_08 of type RESERVED
  Processing container_1430213948957_0001_01_02 of type FINISHED
  Processing container_1430213948957_0001_01_06 of type START
  Processing container_1430213948957_0001_01_06 of type ACQUIRED
  Processing container_1430213948957_0001_01_06 of type LAUNCHED
  Processing container_1430213948957_0001_01_04 of type FINISHED
  Processing container_1430213948957_0001_01_09 of type START
  Processing container_1430213948957_0001_01_09 of type ACQUIRED
  Processing container_1430213948957_0001_01_09 of type LAUNCHED
  Processing container_1430213948957_0001_01_10 of type RESERVED
  Processing container_1430213948957_0001_01_03 of type FINISHED
  Processing container_1430213948957_0001_01_08 of type START
  Processing container_1430213948957_0001_01_08 of type ACQUIRED
  Processing container_1430213948957_0001_01_08 of type LAUNCHED
  Processing container_1430213948957_0001_01_05 of type FINISHED
  Processing container_1430213948957_0001_01_11 of type START
  Processing container_1430213948957_0001_01_11 of type ACQUIRED
  Processing container_1430213948957_0001_01_11 of type LAUNCHED
  Processing container_1430213948957_0001_01_07 of type FINISHED
  Processing container_1430213948957_0001_01_12 of type START
  Processing container_1430213948957_0001_01_12 of type ACQUIRED
  Processing container_1430213948957_0001_01_12 of type LAUNCHED
  Processing container_1430213948957_0001_01_13 of type RESERVED
  Processing container_1430213948957_0001_01_06 of type FINISHED
  Processing container_1430213948957_0001_01_10 of type START
  Processing container_1430213948957_0001_01_10 of type ACQUIRED
  Processing container_1430213948957_0001_01_10 of type LAUNCHED
  Processing container_1430213948957_0001_01_09 of type FINISHED
  Processing container_1430213948957_0001_01_14 of type START
  Processing container_1430213948957_0001_01_14 of type ACQUIRED
  Processing container_1430213948957_0001_01_14 of type LAUNCHED
  Processing container_1430213948957_0001_01_15 of type RESERVED
  Processing container_1430213948957_0001_01_08 of type FINISHED
  Processing container_1430213948957_0001_01_13 of type START
  Processing container_1430213948957_0001_01_16 of type RESERVED
  Processing container_1430213948957_0001_01_13 of type ACQUIRED
  Processing container_1430213948957_0001_01_13 of type LAUNCHED
  Processing

[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression


[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561057#comment-14561057
 ] 

Hudson commented on YARN-3686:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2138 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2138/])
YARN-3686. CapacityScheduler should trim default_node_label_expression. (Sunil 
G via wangda) (wangda: rev cdbd66be111c93c85a409d47284e588c453ecae9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java


 CapacityScheduler should trim default_node_label_expression
 ---

 Key: YARN-3686
 URL: https://issues.apache.org/jira/browse/YARN-3686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch, 
 0003-YARN-3686.patch, 0004-YARN-3686.patch


 We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

2015-05-27 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561118#comment-14561118
]

Allen Wittenauer commented on YARN-3066:

bq. As Linux, OSX, Solaris and BSD all support the setsid(2) syscall and it's
part of POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm), isn't
a better solution just to wrap setsid() + exec() in a little bit of JNI? That
would avoid the need to install external executables.

That would break platforms that don't have a working libhadoop (which are
plentiful).

However, there could be a test here that says if libhadoop is available, use it.

Hadoop leaves orphaned tasks running after job is killed

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes


[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561077#comment-14561077
 ] 

Hudson commented on YARN-3632:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #198 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/198/])
YARN-3632. Ordering policy should be allowed to reorder an application when 
demand changes. Contributed by Craig Welch (jianhe: rev 
10732d515f62258309f98e4d7d23249f80b1847d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FairOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression


[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561073#comment-14561073
 ] 

Hudson commented on YARN-3686:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #198 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/198/])
YARN-3686. CapacityScheduler should trim default_node_label_expression. (Sunil 
G via wangda) (wangda: rev cdbd66be111c93c85a409d47284e588c453ecae9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java


 CapacityScheduler should trim default_node_label_expression
 ---

 Key: YARN-3686
 URL: https://issues.apache.org/jira/browse/YARN-3686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
Priority: Critical
 Fix For: 2.7.1

 Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch, 
 0003-YARN-3686.patch, 0004-YARN-3686.patch


 We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561075#comment-14561075
 ] 

Hudson commented on YARN-160:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #198 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/198/])
YARN-160. Enhanced NodeManager to automatically obtain cpu/memory values from 
underlying OS when configured to do so. Contributed by Varun Vasudev. (vinodkv: 
rev 500a1d9c76ec612b4e737888f4be79951c11591d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-tools/hadoop-gridmix/src/test/java/org/apache/hadoop/mapred/gridmix/DummyResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java


 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 YARN-160.007.patch, YARN-160.008.patch, apache-yarn-160.0.patch, 
 apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561098#comment-14561098
]

MENG DING commented on YARN-1197:
-

Thanks [~vinodkv] and [~leftnoteasy] for the great comments!

*To [~vinodkv]:*

bq. Expanding containers at ACQUIRED state sounds useful in theory. But agree
with you that we can punt it for later.
Thanks for the confirmation :-)

bq. To your example of concurrent increase/decrease sizing requests from AM,
shall we simply say that only one change-in-progress is allowed for any given
container?
Actually we really wanted to be able to achieve this, but with the current
asymmetric logic of increasing resource from RM, and decreasing resource from
NM, it doesn't seem to be possible :-( The reason is because:
* The increase action starts from AM requesting the increase from RM, being
granted a resource increase token, then initiating the increase action on NM,
until finally NM confirming with RM about the increase.
* Once an increase token has been granted to AM, and before it expires (10
minutes by default), if AM does not initiate the increase action on NM, *NM
will have no idea that an increase is already in progress*.
* If, at this moment, AM initiates a resource decrease action on NM, NM will go
ahead and honor it. So in effect, there can be concurrent decrease/increase
action going on, and there doesn't seem to be a way to block this.

bq. If we do the above, this will also simplify most of the code, as we will
simply have the notion of a Change, instead of an explicit increase/decrease
everywhere. For e.g., we will just have a ContainerResourceChangeExpirer.
I believe the ContainerResourceChangeExpirer only applies to the container
resource increase action. The container decrease action goes directly through
NM so it does not need an expiration logic.

bq. There will be races with container-states toggling from RUNNING to finished
states, depending on when AM requests a size-change and when NMs report that a
container finished. We can simply say that the state at the ResourceManager
wins.
Agreed.

bq. Didn't understand why we need this RM-NM confirmation. The token from RM to
AM to NM should be enough for NM to update its view, right?
This is the same as the reasons listed above.

bq. Instead of adding new records for ContainerResourceIncrease / decrease in
AllocationResponse, should we add a new field in the API record itself stating
if it is a New/Increased/Decreased container? If we move to a single change
model, it's likely we will not even need this.

I am open to this suggestion. We could add a field in the existing
*ContainerProto* to indicate if this Container is new/increased/decreased
container. The only thing I am not sure is if we can still change the
AllocateResponseProto now that the ContainerResourceIncrease/Decrease is
already in the trunk?

bq. Any obviously invalid change-requests should be rejected right-away. For
e.g, an increase to more than cluster's max container size. Seemed like you are
suggesting we ignore the invalid requests.

Agreed that any invalid increase requests from AM to RM, and invalid decrease
requests from AM to NM should be directly rejected. The 'ignore' case I was
referring to is in the context of NodeUpdate from NM to RM.

bq. Nit: In the design doc, the high-level flow for container-increase point #7
incorrectly talks about decrease instead of increase.

Yes, this is a mistake, and I will correct it.

bq. I propose we do this in a branch

Definitely. There is already a YARN-1197 branch, and we can simply work in that
branch.

*To [~leftnoteasy]:*

bq. Actually the appoarch in design doc is this (Meng plz let me know if I
misunderstood). In scheduler's implementation, it allows only one pending
change request for same container, later change-request will either overwrite
prior one or rejected.
The current design only allows one increase request in the whole system, which
is guaranteed by the ContainerResourceIncreaseExpirer object. However, as
explained above, we cannot block decrease action while an increase action is
still in progress.

bq. 1) For the protocols between servers/AMs, mostly same to previous doc, the
biggest change I can see is the ContainerResourceChangeProto in
NodeHeartbeatResponseProto, which makes sense to me.

Yes, the ContainerResourceChangeProto is the biggest change. Glad that you
agree with this new protocol :-)

bq. 2) For the client side change: 2.2.1, +1 to option 3.

Great. I will remove option 1 and option 2 from the design doc.

bq. 3) For 2.3.3.2 scheduling part, {{The scheduling of an outstanding resource
increase request to a container will be skipped if there are either:}}. Both of
the two may not needed since AM can require for more resource when container
increase (e.g. container increased to 4G, and AM wants it to be 6G before
notify NM).

[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

[
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561130#comment-14561130
]

Alan Burlison commented on YARN-3066:
-

Yes, that's a good point about not every platform having libhadoop. Solaris for
example has the syscall but not the executable, so in that case it's a better
solution to use the syscall but that's not always going to be the case.

Hadoop leaves orphaned tasks running after job is killed

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561256#comment-14561256
 ] 

Rohith commented on YARN-3585:
--

bq. Could you to instrument logs in the state store code to verify the leveldb 
database is indeed being closed even when it hangs? 
sorry, did not get it exactly what and where should I add logs? Do you mean 
should I add log after {{NMLeveldbStateStoreService#closeStorage()}} being 
called?

 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Priority: Critical

 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

2015-05-27 Thread Dmitry Sivachenko (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561234#comment-14561234
]

Dmitry Sivachenko commented on YARN-3066:
-

Solaris can use the same ssid program (it is just a simple wrapper for setsid()
syscall).
I just proposed a simplest fix for that problem.
JNI wrapper sounds like better approach.

What I want to see in any case is the loud error message in case setsid binary
(or setsid() syscall if we go JNI way) is unavailable. Right now it pretends
to work and I spent some time digging out whats going wrong and why I see a lot
of orphans.

Hadoop leaves orphaned tasks running after job is killed

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3724) Native compilation on Solaris fails on Yarn due to use of FTS


 [ 
https://issues.apache.org/jira/browse/YARN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison moved HADOOP-11952 to YARN-3724:
--

Assignee: (was: Alan Burlison)
Target Version/s:   (was: 2.8.0)
 Key: YARN-3724  (was: HADOOP-11952)
 Project: Hadoop YARN  (was: Hadoop Common)

 Native compilation on Solaris fails on Yarn due to use of FTS
 -

 Key: YARN-3724
 URL: https://issues.apache.org/jira/browse/YARN-3724
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: Solaris 11.2
Reporter: Malcolm Kavalsky
   Original Estimate: 24h
  Remaining Estimate: 24h

 Compiling the Yarn Node Manager results in fts not found. On Solaris we 
 have an alternative ftw with similar functionality.
 This is isolated to a single file container-executor.c
 Note that this will just fix the compilation error. A more serious issue is 
 that Solaris does not support cgroups as Linux does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed


 [ 
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison updated YARN-3066:

External issue ID: Bug 21156330 - Solaris should provide a setsid(1) 
command to run a command in a new session

21156330 is the Solaris bug which covers adding a setsid command-line utility 
to Solaris

 Hadoop leaves orphaned tasks running after job is killed
 

 Key: YARN-3066
 URL: https://issues.apache.org/jira/browse/YARN-3066
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
Reporter: Dmitry Sivachenko

 When spawning user task, node manager checks for setsid(1) utility and spawns 
 task program via it. See 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
  for instance:
 String exec = Shell.isSetsidAvailable? exec setsid : exec;
 FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain exec is 
 used to spawn user task.  If that task spawns other external programs (this 
 is common case if a task program is a shell script) and user kills job via 
 mapred job -kill Job, these child processes remain running.
 1) Why do you silently ignore the absence of setsid(1) and spawn task process 
 via exec: this is the guarantee to have orphaned processes when job is 
 prematurely killed.
 2) FreeBSD has a replacement third-party program called ssid (which does 
 almost the same as Linux's setsid).  It would be nice to detect which binary 
 is present during configure stage and put @SETSID@ macros into java file to 
 use the correct name.
 I propose to make Shell.isSetsidAvailable test more strict and fail to start 
 if it is not found:  at least we will know about the problem at start rather 
 than guess why there are orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3724) Native compilation on Solaris fails on Yarn due to use of FTS


 [ 
https://issues.apache.org/jira/browse/YARN-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison updated YARN-3724:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-3719

 Native compilation on Solaris fails on Yarn due to use of FTS
 -

 Key: YARN-3724
 URL: https://issues.apache.org/jira/browse/YARN-3724
 Project: Hadoop YARN
  Issue Type: Sub-task
 Environment: Solaris 11.2
Reporter: Malcolm Kavalsky
   Original Estimate: 24h
  Remaining Estimate: 24h

 Compiling the Yarn Node Manager results in fts not found. On Solaris we 
 have an alternative ftw with similar functionality.
 This is isolated to a single file container-executor.c
 Note that this will just fix the compilation error. A more serious issue is 
 that Solaris does not support cgroups as Linux does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs


[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561268#comment-14561268
 ] 

Zhijie Shen commented on YARN-3700:
---

Almost good to me, two nits:

1. getAllApplications - getApplications?

2. Can we use -1 or 0 instead of Long.MAX_VALUE to indicate appsNum not 
provided?
{code}
appsNum == Long.MAX_VALUE ? this.maxLoadedApplications : appsNum
{code}

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
 YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-27 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3700:

Attachment: YARN-3700.4.patch

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
 YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-27 Thread Xuan Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561576#comment-14561576
]

Xuan Gong commented on YARN-3700:
-

bq. getAllApplications - getApplications?

Done

bq. Can we use -1 or 0 instead of Long.MAX_VALUE to indicate appsNum not
provided?

Can not. The default value for GetApplicationsRequest--getLimit() is
Long.MAX_VALUE

Also, test the patch locally.
* Set yarn.timeline-service.generic-application-history.max-applications as 1
* run two MR pi examples
* go to http://localhost:8188/applicationhistory/apps and
http://localhost:8188/ws/v1/applicationhistory/apps. Both of them are showing
only one application which is the latest application
* http://localhost:8188/applicationhistory/apps?apps.num=2 and
http://localhost:8188/ws/v1/applicationhistory/apps?limit=2. Both of them are
showing two applications

ATS Web Performance issue at load time when large number of jobs

Key: YARN-3700
URL: https://issues.apache.org/jira/browse/YARN-3700
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch,
YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch

Currently, we will load all the apps when we try to load the yarn
timelineservice web page. If we have large number of jobs, it will be very
slow.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561573#comment-14561573
 ] 

Wangda Tan commented on YARN-1197:
--

[~mding].
For the comparison of resources, I think for both increase/decrease, it should 
be = or = for all dimensions. But if resource calculator is default, increase 
v-core makes no sense. So I think ResourceCalculator has to be used, but also 
needs to check all individual dimensions.

So the logic will be:
{code}
if (increase): 
   delta = target - now
   if delta.mem  0 || delta.vcore  0:
  throw exception
   if resourceCalculator.lessOrEqualThan(delta, 0):
  throw exception
   // .. move forward
{code}

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI

2015-05-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561590#comment-14561590
 ] 

Wangda Tan commented on YARN-3581:
--

[~Naganarasimha], 
Thanks for update, the latest patch looks good. And I think it's better to add 
to 2.7.1 as well to avoid people use it. We will not remove these options in 
2.8, but we should let people know about the risk.

Wangda

 Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
 -

 Key: YARN-3581
 URL: https://issues.apache.org/jira/browse/YARN-3581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch


 In 2.6.0, we added an option called -directlyAccessNodeLabelStore to make 
 RM can start with label-configured queue settings. After YARN-2918, we don't 
 need this option any more, admin can configure queue setting, start RM and 
 configure node label via RMAdminCLI without any error.
 In addition, this option is very restrictive, first it needs to run on the 
 same node where RM is running if admin configured to store labels in local 
 disk.
 Second, when admin run the option when RM is running, multiple process write 
 to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI

2015-05-27 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3581:
-
Target Version/s: 2.8.0, 2.7.1  (was: 2.8.0)

 Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
 -

 Key: YARN-3581
 URL: https://issues.apache.org/jira/browse/YARN-3581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch


 In 2.6.0, we added an option called -directlyAccessNodeLabelStore to make 
 RM can start with label-configured queue settings. After YARN-2918, we don't 
 need this option any more, admin can configure queue setting, start RM and 
 configure node label via RMAdminCLI without any error.
 In addition, this option is very restrictive, first it needs to run on the 
 same node where RM is running if admin configured to store labels in local 
 disk.
 Second, when admin run the option when RM is running, multiple process write 
 to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561644#comment-14561644
 ] 

Hadoop QA commented on YARN-3581:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 29s | The applied patch generated  6 
new checkstyle issues (total was 40, now 42). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 52s | Tests passed in 
hadoop-yarn-client. |
| | |  42m 29s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735661/YARN-3581.20150528-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c46d4ba |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8102/artifact/patchprocess/diffcheckstylehadoop-yarn-client.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8102/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8102/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8102/console |


This message was automatically generated.

 Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
 -

 Key: YARN-3581
 URL: https://issues.apache.org/jira/browse/YARN-3581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch


 In 2.6.0, we added an option called -directlyAccessNodeLabelStore to make 
 RM can start with label-configured queue settings. After YARN-2918, we don't 
 need this option any more, admin can configure queue setting, start RM and 
 configure node label via RMAdminCLI without any error.
 In addition, this option is very restrictive, first it needs to run on the 
 same node where RM is running if admin configured to store labels in local 
 disk.
 Second, when admin run the option when RM is running, multiple process write 
 to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3723) Need to clearly document primaryFilter and otherInfo value type


 [ 
https://issues.apache.org/jira/browse/YARN-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3723:
--
Attachment: YARN-3723.1.patch

Add some description about the value type as well as fix a minor format issue 
in the document.

 Need to clearly document primaryFilter and otherInfo value type
 ---

 Key: YARN-3723
 URL: https://issues.apache.org/jira/browse/YARN-3723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Attachments: YARN-3723.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561641#comment-14561641
 ] 

Varun Saxena commented on YARN-3051:


In the API designed in the patch, there are few things I wanted to discuss.

#  We can either return a single timeline entity for a flow ID(having 
aggregated metric values)  or multiple entities indicating multiple flows runs 
for a flow ID. I have included an API for the former as of now. I think there 
can be uses cases for both though. [~vrushalic],  did hRaven have the facility 
for both kinds of queries ? I mean, is there a known use case ?
# Do we plan to include additional info in the user table which can be used for 
filtering user level entites ? Could not think of any use case but just for 
flexibility I have added filters in the API {{getUserEntities}}.
# I have included an API to query flow information based on the appid. As of 
now I return the flow to which app belongs to(includes multiple runs) instead 
of flow run it belongs to. Which is a more viable scenario ? Or we need to 
support both ?
# In the HBase schema design, there are 2 flow summary tables aggregated daily 
and weekly respectively. So to limit the number of metric records or to see 
metrics in a specific time window, I have added metric start and metric end 
timestamps in the API design. But if  metrics are aggregated daily and weekly, 
we wont be able to get something like value of specific metric for a flow from 
say Thursday 4 pm to Friday 9 am. [~vrushalic], can you confirm ? If this is 
so, a timestamp doesnt make much sense. Dates can be specified instead.
# Will there be queue table(s) in addition to user table(s) ? If yes, how will 
queue data be aggregated ? Based on entity type ? I may need an additional API 
for queues then.
# The doubt I have regarding flow version will anyways be addressed by YARN-3699

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051-YARN-2928.003.patch, 
 YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
 YARN-3051.wip.patch, YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3569) YarnClient.getAllQueues returns a list of queues that do not display running apps.


[ 
https://issues.apache.org/jira/browse/YARN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561580#comment-14561580
 ] 

Hadoop QA commented on YARN-3569:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 34s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 50s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| | |  46m 51s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | org.apache.hadoop.yarn.client.TestResourceTrackerOnHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731637/YARN-3569.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c46d4ba |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8101/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8101/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8101/console |


This message was automatically generated.

 YarnClient.getAllQueues returns a list of queues that do not display running 
 apps.
 --

 Key: YARN-3569
 URL: https://issues.apache.org/jira/browse/YARN-3569
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.8.0
Reporter: Spandan Dutta
Assignee: Spandan Dutta
 Attachments: YARN-3569.patch


 YarnClient.getAllQueues() returns a list of queues. If we pick a queue from 
 this list and call getApplications on it, we always get an empty list 
 even-though applications are running on that queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-05-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561613#comment-14561613
 ] 

Jason Lowe commented on YARN-3585:
--

Yes, the idea is to show whether we successfully closed the database or not 
when the problem occurs.  Sorry I wasn't clear on that.

 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Priority: Critical

 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3603) Application Attempts page confusing

2015-05-27 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sunil G updated YARN-3603:
--
Attachment: 0001-YARN-3603.patch

Uploading an initial version patch.

* Container ID is shown only for Running containers in App Attempt page.
Change the column name to Running Container ID
* AM Container is showing the container link when Attempt is running, else
showing the container ID in plain text. Here we can change label to AM
Container Link in case when AM is running and AM Container ID while AM is
finished or killed
* AM Container logs are shown in App page but not app attempt page. An entry is
added for same as AM Container Logs

Application Attempts page confusing
---

Key: YARN-3603
URL: https://issues.apache.org/jira/browse/YARN-3603
Project: Hadoop YARN
Issue Type: Bug
Components: webapp
Affects Versions: 2.8.0
Reporter: Thomas Graves
Assignee: Sunil G
Attachments: 0001-YARN-3603.patch

The application attempts page
(http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01)
is a bit confusing on what is going on. I think the table of containers
there is for only Running containers and when the app is completed or killed
its empty. The table should have a label on it stating so.
Also the AM Container field is a link when running but not when its killed.
That might be confusing.
There is no link to the logs in this page but there is in the app attempt
table when looking at http://
rm:8088/cluster/app/application_1431101480046_0003

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561659#comment-14561659
]

Varun Saxena commented on YARN-3411:

In ATSv1, we consider the timestamp when entity is added to backend store in
addition to entity creation time. This is used while filtering out entities
during querying. I cannot see this being captured specifically in this patch.
It can be easily added to Column Family info.

[~zjshen], [~sjlee0], do we need to add this info ? Zhijie, for this, any
specific use case you know of in ATSv1 ?

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Fix For: YARN-2928

Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt,
YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt,
YARN-3411.poc.7.txt, YARN-3411.poc.txt

There is work that's in progress to implement the storage based on a Phoenix
schema (YARN-3134).
In parallel, we would like to explore an implementation based on a native
HBase schema for the write path. Such a schema does not exclude using
Phoenix, especially for reads and offline queries.
Once we have basic implementations of both options, we could evaluate them in
terms of performance, scalability, usability, etc. and make a call.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-05-27 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561770#comment-14561770
 ] 

Vrushali C commented on YARN-3051:
--

Hi Varun,

Good points.. My answers inline.
bq. We can either return a single timeline entity for a flow ID(having 
aggregated metric values) or multiple entities indicating multiple flows runs 
for a flow ID. I have included an API for the former as of now. I think there 
can be uses cases for both though. Vrushali C, did hRaven have the facility for 
both kinds of queries ? I mean, is there a known use case ?

Yes, there are use cases for both. hRaven has apis for both types of calls, 
they are named differently though. The /flow endpoint in hRaven will return 
multiple flow runs (limited by filters). The /summary will return aggregated 
values for all the runs of that flow in that time range filter. Let me give an 
example (a hadoop sleep job for simplicity).

Say user janedoe runs a hadoop sleep job 3 times today and has run it 5 times 
yesterday and say 6 times on one day about a month back. Now, we may want to 
see two different things:

#1 summarized stats for flow “Sleep job” invoked between last 2 days: It would 
say this flow was run 8 times, first was at timestamp X, last run was at 
timestamp Y, it took up a total of N megabytemillis, had a total of M 
containers across all runs, etc etc. It tells us how much of the cluster 
capacity a particular flow from a particular user is taking up.

-#2 List of flow runs: Will show us details about each flow run. If we say 
limit = 3 in the query parameters, it would return latest 3 runs of this flow. 
If we say limit = 100, it would return all the runs in this particular case 
(including the ones from a month back). If we pass in flowVersion=XXYYZZ, then 
it would return the list of flows that match this version. 

For the initial development, I think we may want to work on #2 first (return 
list of flow runs). The summary api will need aggregated tables which we can 
add later on, we could file a jira for that, my 2c.

bq. Do we plan to include additional info in the user table which can be used 
for filtering user level entites ? Could not think of any use case but just for 
flexibility I have added filters in the API getUserEntities.

I haven’t looked at the code in detail, but as such, for user level entities, 
we would want time range, limit on number of records returns, flow name filter, 
cluster name filter.

bq. I have included an API to query flow information based on the appid. As of 
now I return the flow to which app belongs to(includes multiple runs) instead 
of flow run it belongs to. Which is a more viable scenario ? Or we need to 
support both ?

An app id can belong to exactly one flow run. App id is the hadoop yarn 
application id, which should be unique on the cluster. Given an app id, we 
should be able to look up the exact flow run and return just that. The 
equivalent api in hRaven is /jobFlow.

bq.  But if metrics are aggregated daily and weekly, we wont be able to get 
something like value of specific metric for a flow from say Thursday 4 pm to 
Friday 9 am. Vrushali C, can you confirm ? If this is so, a timestamp doesnt 
make much sense. Dates can be specified instead.

The thinking is to split the querying across tables. We would query both the 
daily summary table for the complete day details and the regular flow tables 
for the details like those of Thursday 4 pm to Friday 9 am. But this does mean 
aggregating on the query side. So, I think, for starters, we could start off by 
allowing Date boundaries. We can enhance the API to accept finer timestamps 
later.

bq. Will there be queue table(s) in addition to user table(s) ? If yes, how 
will queue data be aggregated ? Based on entity type ? I may need an additional 
API for queues then.
Yes, we would need a queue based aggregation table. Right now, those details 
are to be worked out. So perhaps we can leave aside the queue based APIs (or 
file a different jira to handle queue based apis).

Hope this helps. I can give you more examples if you would like to get more 
details or have any other questions. I will also look at the patch this week.  
Also, we should ensure we use the same classes/methods used for key related 
(flow keys, row keys) construction and parsing across reader apis and writer 
apis else they will diverge.

thanks
Vrushali


 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments:

[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object


[ 
https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561792#comment-14561792
 ] 

Hudson commented on YARN-3647:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7909 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7909/])
YARN-3647. RMWebServices api's should use updated api from 
CommonNodeLabelsManager to get NodeLabel object. (Sunil G via wangda) (wangda: 
rev ec0a852a37d5c91a62d3d0ff3ddbd9d58235b312)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


 RMWebServices api's should use updated api from CommonNodeLabelsManager to 
 get NodeLabel object
 ---

 Key: YARN-3647
 URL: https://issues.apache.org/jira/browse/YARN-3647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch


 After YARN-3579, RMWebServices apis can use the updated version of apis in 
 CommonNodeLabelsManager which gives full NodeLabel object instead of creating 
 NodeLabel object from plain label name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty

Zhijie Shen created YARN-3725:
-

 Summary: App submission via REST API is broken in secure mode due 
to Timeline DT service address is empty
 Key: YARN-3725
 URL: https://issues.apache.org/jira/browse/YARN-3725
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


YARN-2971 changes TimelineClient to use the service address from Timeline DT to 
renew the DT instead of configured address. This break the procedure of 
submitting an YARN app via REST API in the secure mode.

The problem is that service address is set by the client instead of the server 
in Java code. REST API response is an encode token Sting, such that it's so 
inconvenient to deserialize it and set the service address and serialize it 
again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561710#comment-14561710
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy]
Makes sense to me. Will update the doc to include this.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561722#comment-14561722
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy]
Makes sense to me. Will update the doc to include this.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be


[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561790#comment-14561790
 ] 

Hudson commented on YARN-3626:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7909 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7909/])
YARN-3626. On Windows localized resources are not moved to the front of the 
classpath when they should be. Contributed by Craig Welch. (cnauroth: rev 
4102e5882e17b75507ae5cf8b8979485b3e24cbc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java


 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.7.1

 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, 
 YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.16.patch, 
 YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty

[
https://issues.apache.org/jira/browse/YARN-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561795#comment-14561795
]

Zhijie Shen commented on YARN-3725:
---

I'm proposing to do the following:

1. Short term fix for 2.7.1: Check if service address in timeline DT is empty
or not. If empty, we fall back to use the configured service address. It will
make app submission via REST API work in secure mode without additional DT
process work unless users really want to renew the DT from somewhere other than
the configure address. It shouldn't be common as we usually only setup one
timeline server per YARN cluster.

2. Long term fix: we can do something similar to HDFS-6904. Let the client to
pass in the service address, and set token's service address at server side
before serializing it into a string. And this problem is not just limited to
ATS. RM REST API doesn't set the service address for RM DT too. It's better to
seek for a common solution. For example, we can fix
DelegationTokenAuthenticationHandler to make all use cases of hadoop http auth
component set the service addr properly. One step further, even RPC protocol
may have the similar problem. For example, if we work with
ApplicationClientProtocol directly, we should get an RM DT without service
address (correct me if I'm wrong).

Thoughts?

App submission via REST API is broken in secure mode due to Timeline DT
service address is empty

Key: YARN-3725
URL: https://issues.apache.org/jira/browse/YARN-3725
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker

YARN-2971 changes TimelineClient to use the service address from Timeline DT
to renew the DT instead of configured address. This break the procedure of
submitting an YARN app via REST API in the secure mode.
The problem is that service address is set by the client instead of the
server in Java code. REST API response is an encode token Sting, such that
it's so inconvenient to deserialize it and set the service address and
serialize it again.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs


[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561651#comment-14561651
 ] 

Zhijie Shen commented on YARN-3700:
---

+1 last patch LGTM will commit it.

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
 YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561682#comment-14561682
 ] 

Zhijie Shen commented on YARN-3411:
---

Yeah, in v1, there's a starttime for entity, which is used to indicate when the 
entity starts to exist. This value is used in multiple places. For example, 
when we query entities, the matched entities are sorted according to timestamp 
to be returned. Also, in v1 the retention granularity is at entity level. We 
check if the starttime of an entity is out of TTL, and then decide to discard 
it and its events.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Fix For: YARN-2928

 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561739#comment-14561739
]

Varun Saxena commented on YARN-3411:

[~zjshen], I was actually talking about store insertion time and not the entity
start time.

If you look at {{LevelDbTimelineStore#checkStartTimeInDb}}, you would find that
there is a store insert time(which is taken as current system time) also added
in addition to entity start time. Pls note that store insert time and entity
start time are not same.

In ATSv1, we could specify a timestamp in query which is used to ignore
entities that were inserted into the store after it. This is done by matching
against the store insert time(which is not same as entity start time).

So for backward compatibility sake, do we need to support it ? If yes, I dont
see it being captured as part of writer implementations, as of now.
If there is no use case for it though, we can drop it in ATSv2.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561741#comment-14561741
]

Varun Saxena commented on YARN-3411:

[~zjshen], I was actually talking about store insertion time and not the entity
start time.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560470#comment-14560470
 ] 

Rohith commented on YARN-3585:
--

I tested locally using YARN-3641 FIX, issue is still exist.

 NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
 --

 Key: YARN-3585
 URL: https://issues.apache.org/jira/browse/YARN-3585
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Priority: Critical

 With NM recovery enabled, after decommission, nodemanager log show stop but 
 process cannot end. 
 non daemon thread:
 {noformat}
 DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
 condition [0x]
 leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
 [0x]
 VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
 Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 
 nid=0x29ed runnable 
 Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 
 nid=0x29ee runnable 
 Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 
 nid=0x29ef runnable 
 Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 
 nid=0x29f0 runnable 
 Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 
 nid=0x29f1 runnable 
 Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 
 nid=0x29f2 runnable 
 Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 
 nid=0x29f3 runnable 
 Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 
 nid=0x29f4 runnable 
 Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 
 runnable 
 Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 
 nid=0x29f5 runnable 
 Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 
 nid=0x29f6 runnable 
 VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
 on condition 
 {noformat}
 and jni leveldb thread stack
 {noformat}
 Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
 #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/libpthread.so.0
 #1  0x7f33dfce2a3b in leveldb::(anonymous 
 namespace)::PosixEnv::BGThreadWrapper(void*) () from 
 /tmp/libleveldbjni-64-1-6922178968300745716.8
 #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
 #3  0x003d830e811d in clone () from /lib64/libc.so.6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3714) AM proxy filter can not get proper default proxy address if RM-HA is enabled

2015-05-27 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560500#comment-14560500
 ] 

Masatake Iwasaki commented on YARN-3714:


In non-HA settings, if users do not explicitly set the 
{{yarn.resourcemanager.webapp.address}} in configuration, 
{{WebAppUtils#getResolvedRMWebAppURLWithoutScheme}} returns RM webapp address 
based on the value of {{yarn.resourcemanager.hostname}} via the default value 
set by yarn-default.xml.
{noformat}
  property
nameyarn.resourcemanager.webapp.address/name
value${yarn.resourcemanager.hostname}:8088/value
  /property
{noformat}

As a result, WebAppUtils#getProxyHostsAndPortsForAmFilter can return proper 
proxy address.

This does not apply to {{yarn.resourcemanager.hostname._rm-id_}} in HA mode.


 AM proxy filter can not get proper default proxy address if RM-HA is enabled
 

 Key: YARN-3714
 URL: https://issues.apache.org/jira/browse/YARN-3714
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor

 Default proxy address could not be got without setting 
 {{yarn.resourcemanager.webapp.address._rm-id_}} and/or 
 {{yarn.resourcemanager.webapp.https.address._rm-id_}} explicitly if RM-HA is 
 enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once


[ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560518#comment-14560518
 ] 

Varun Saxena commented on YARN-3489:


[~leftnoteasy], sorry missed your comment...Will have a look.

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489-branch-2.7.02.patch, 
 YARN-3489-branch-2.7.03.patch, YARN-3489-branch-2.7.patch, 
 YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560485#comment-14560485
 ] 

Varun Vasudev commented on YARN-3678:
-

Is this in secure or non-secure mode?

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-05-27 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562083#comment-14562083
 ] 

Robert Kanter commented on YARN-3528:
-

[~brahmareddy] are you still planning on working on this?

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test

 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562180#comment-14562180
 ] 

gu-chi commented on YARN-3678:
--

I made this https://github.com/apache/hadoop/pull/20/

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-05-27 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: YARN-3727.000.patch

 For better error recovery, check if the directory exists before using it for 
 localization.
 --

 Key: YARN-3727
 URL: https://issues.apache.org/jira/browse/YARN-3727
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3727.000.patch


 For better error recovery, check if the directory exists before using it for 
 localization.
 We saw the following localization failure happened due to existing cache 
 directories.
 {code}
 2015-05-11 18:59:59,756 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
 null }, Rename cannot overwrite non empty destination directory 
 //8/yarn/nm/usercache//filecache/21637
 2015-05-11 18:59:59,756 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource 
 hdfs:///X/libjars/1234.jar(-//8/yarn/nm/usercache//filecache/21637/1234.jar)
  transitioned from DOWNLOADING to FAILED
 {code}
 The real cause for this failure may be disk failure, LevelDB operation 
 failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
 others.
 I wonder whether we can add error recovery code to avoid the localization 
 failure by not using the existing cache directories for localization.
 The exception happened at {{files.rename(dst_work, destDirPath, 
 Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
 the exception, the existing cache directory used by {{LocalizedResource}} 
 will be deleted.
 {code}
 try {
  .
   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
 } catch (Exception e) {
   try {
 files.delete(destDirPath, true);
   } catch (IOException ignore) {
   }
   throw e;
 } finally {
 {code}
 Since the conflicting local directory will be deleted after localization 
 failure,
 I think it will be better to check if the directory exists before using it 
 for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty