date:20141023


[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181053#comment-14181053
 ] 

Hadoop QA commented on YARN-810:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656675/YARN-810.patch
  against trunk revision d71d40a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5515//console

This message is automatically generated.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810.patch, YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181055#comment-14181055
 ] 

Hadoop QA commented on YARN-2724:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676532/YARN-2724.2.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5514//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5514//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5514//console

This message is automatically generated.

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-

[jira] [Updated] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


 [ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2724:

Attachment: YARN-2724.3.patch

fix -1 on findBug

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181074#comment-14181074
 ] 

Hadoop QA commented on YARN-2724:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676535/YARN-2724.3.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5516//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5516//console

This message is automatically generated.

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181090#comment-14181090
 ] 

Hadoop QA commented on YARN-2701:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676533/YARN-2701.addendum.3.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5517//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5517//console

This message is automatically generated.

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, 
 YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, 
 YARN-2701.addendum.3.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor


[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181252#comment-14181252
 ] 

Hudson commented on YARN-2198:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #721 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/721/])
YARN-2198. Remove the need to run NodeManager as privileged account for Windows 
Secure Container Executor. Contributed by Remus Rusanu (jianhe: rev 
3b12fd6cfbf4cc91ef8e8616c7aafa9de006cde5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.sln
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-common-project/hadoop-common/src/main/native/native.vcxproj
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.mc
* hadoop-common-project/hadoop-common/src/main/winutils/service.c
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcessTree.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/yarn/server/nodemanager/windows_secure_container_executor.c
* hadoop-common-project/hadoop-common/src/main/winutils/config.cpp
* hadoop-common-project/hadoop-common/src/main/winutils/hadoopwinutilsvc.idl
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* hadoop-common-project/hadoop-common/src/main/winutils/main.c
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
* hadoop-common-project/hadoop-common/pom.xml
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.vcxproj
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/yarn/server/nodemanager/windows_secure_container_executor.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* hadoop-common-project/hadoop-common/src/main/winutils/client.c
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* .gitignore
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java


 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus

[jira] [Commented] (YARN-2700) TestSecureRMRegistryOperations failing on windows: auth problems


[ 
https://issues.apache.org/jira/browse/YARN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181260#comment-14181260
 ] 

Hudson commented on YARN-2700:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #721 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/721/])
YARN-2700 TestSecureRMRegistryOperations failing on windows: auth problems 
(stevel: rev 90e5ca24fbd3bb2da2a3879cc9b73f0b1d7f3e03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/AbstractSecureRegistryTest.java
* hadoop-yarn-project/CHANGES.txt


 TestSecureRMRegistryOperations failing on windows: auth problems
 

 Key: YARN-2700
 URL: https://issues.apache.org/jira/browse/YARN-2700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
 Environment: Windows Server, Win7
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.6.0

 Attachments: YARN-2700-001.patch


 TestSecureRMRegistryOperations failing on windows: unable to create the root 
 /registry path with permissions problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2732) Fix syntax error in SecureContainer.apt.vm


[ 
https://issues.apache.org/jira/browse/YARN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181257#comment-14181257
 ] 

Hudson commented on YARN-2732:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #721 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/721/])
YARN-2732. Fixed syntax error in SecureContainer.apt.vm. Contributed by Jian 
He. (zjshen: rev b94b8b30f282563ee2ecdd25761b2345aaf06c9b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Fix syntax error in SecureContainer.apt.vm
 --

 Key: YARN-2732
 URL: https://issues.apache.org/jira/browse/YARN-2732
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2732.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

2014-10-23 Thread Naganarasimha G R (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181304#comment-14181304
]

Naganarasimha G R commented on YARN-2495:
-

Thanks for reviewing Wangda :
bq. 2) It seems NM_LABELS_FETCH_INTERVAL_MS not been used in the patch, did you
forget to do that?
-- Earlier was planning to make node labels script only to be dynamic and
configruation based as static. Now based on your comment 4 will make it dynamic
and change the configuration name too.

bq. 3) Regarding ResourceTrackerProtocol, I think NodeHeartbeatRequest should
only report labels when labels changed. So there're 3 possible values of node
labels in NodeHeartbeatRequest ... And RegisterNodeManagerRequest should report
label every time registering.
-- Yes this was my plan and will be doing it in the same way.
But was thinking about one sceanario labels got changed and on call to
NodeLabelsProvider.getLabels() it returns the new labels but the heartbeat
failed due to some reason. in that case NodeLabelsProvider will not be able to
detect this and on next request to getLabels() it will return null. So we
should have some mechanism such that NodeLabelsProvider are informed whether RM
accepted the change in labels so that appropriate SET of labels are provided on
call to getLabels (also if needed we can have RM Rejected Labels too for
logging purpose)
Planning to have 3 interfaces in NodeLabelsProvider
* getNodeLabels() : to get the labels which can be used for
registration
* getNodeLabelsOnModify() : to get the labels on modification which
can be used for heartbeat
* rmUpdateNodeLabelsStatus(boolean success) : to indicate that next
call to getNodeLabelsOnModify can be reset to null

bq. 4.1 Why this class extends from CompositeService? Did you want to add more
component to it? If not, AbstractService should be enough. If the purpose of
the NodeLabelsFetcherService is only create a NodeLabelsProvider, and the
NodeLabelsProvider will take care of periodically read configuration from
yarn-site.xml.I suggest to rename NodeLabelsFetcherService to
NodeLabelsProviderFactory, and not extends from any Service, because the
NodeLabelsProvider should be a Service. Rename NodeLabelsProvider to
NodeLabelsProviderService if your purpose is as what I mentioned.
-- Your idea seems to be better, will try to do it in the way you have
specified and hence NodeLabelsFetcherService will become factory or i will make
it absolute.
ConfigurationNodeLabelsProvider : will make it dynamic. i,e.
periodically it will read the yarn-site and get the Labels.
{quote}
6) More implementation suggestions:
Since we need central node labels configuration, I suggest to leverage what we
already have in RM admin CLI directly – user can use RM admin CLI add/remove
node labels. We can disable this when we're ready to do non-central node label
configuration.And there should be an option to tell if distributed node label
configuration is used. If it's distributed, AdminService should disable admin
change labels on nodes via RM admin CLI. I suggest to do this in a separated
JIRA.
{quote}
-- I presume central node labels configuration as Cluster Valid Node Labels
stored at RM side for validation of labels if so ok will do it in the same way
as that of RM Admin CLI
and for ??If it's distributed, AdminService should disable admin change labels
on nodes via RM admin CLI?? will add a jira,
but was wondering how to do this ? by configuration with new parameter? I was
earlier under the impression as MemoryRMNodeLabelsManager = is for distributed
Configuration and RMNodeLabelsManager is for Centrallized configuration. and
some factory will take care of this

Other comments will handle

Allow admin specify labels in each NM (Distributed configuration)
-

Key: YARN-2495
URL: https://issues.apache.org/jira/browse/YARN-2495
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
Attachments: YARN-2495.20141023-1.patch, YARN-2495_20141022.1.patch

Target of this JIRA is to allow admin specify labels in each NM, this covers
- User can set labels in each NM (by setting yarn-site.xml or using script
suggested by [~aw])
- NM will send labels to RM via ResourceTracker API
- RM will set labels in NodeLabelManager when NM register/update labels

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2700) TestSecureRMRegistryOperations failing on windows: auth problems


[ 
https://issues.apache.org/jira/browse/YARN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181386#comment-14181386
 ] 

Hudson commented on YARN-2700:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1910 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1910/])
YARN-2700 TestSecureRMRegistryOperations failing on windows: auth problems 
(stevel: rev 90e5ca24fbd3bb2da2a3879cc9b73f0b1d7f3e03)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/AbstractSecureRegistryTest.java


 TestSecureRMRegistryOperations failing on windows: auth problems
 

 Key: YARN-2700
 URL: https://issues.apache.org/jira/browse/YARN-2700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
 Environment: Windows Server, Win7
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.6.0

 Attachments: YARN-2700-001.patch


 TestSecureRMRegistryOperations failing on windows: unable to create the root 
 /registry path with permissions problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2692) ktutil test hanging on some machines/ktutil versions


[ 
https://issues.apache.org/jira/browse/YARN-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181381#comment-14181381
 ] 

Hudson commented on YARN-2692:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1910 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1910/])
YARN-2692 ktutil test hanging on some machines/ktutil versions (stevel) 
(stevel: rev 85a88649c3f3fb7280aa511b2035104bcef28a6f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/RegistryTestHelper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java
* hadoop-yarn-project/CHANGES.txt


 ktutil test hanging on some machines/ktutil versions
 

 Key: YARN-2692
 URL: https://issues.apache.org/jira/browse/YARN-2692
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.6.0

 Attachments: YARN-2692-001.patch


 a couple of the registry security tests run native {{ktutil}}; this is 
 primarily to debug the keytab generation. [~cnauroth] reports that some 
 versions of {{kinit}} hang. Fix: rm the tests. [YARN-2689]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor


[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181378#comment-14181378
 ] 

Hudson commented on YARN-2198:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1910 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1910/])
YARN-2198. Remove the need to run NodeManager as privileged account for Windows 
Secure Container Executor. Contributed by Remus Rusanu (jianhe: rev 
3b12fd6cfbf4cc91ef8e8616c7aafa9de006cde5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/yarn/server/nodemanager/windows_secure_container_executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.sln
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* hadoop-common-project/hadoop-common/src/main/winutils/main.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.mc
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* hadoop-common-project/hadoop-common/src/main/native/native.vcxproj
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
* hadoop-common-project/hadoop-common/src/main/winutils/service.c
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/yarn/server/nodemanager/windows_secure_container_executor.h
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-common-project/hadoop-common/src/main/winutils/hadoopwinutilsvc.idl
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* hadoop-common-project/hadoop-common/src/main/winutils/client.c
* hadoop-common-project/hadoop-common/src/main/winutils/config.cpp
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.vcxproj
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcessTree.java
* .gitignore
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* hadoop-common-project/hadoop-common/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java


 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee:

[jira] [Commented] (YARN-2732) Fix syntax error in SecureContainer.apt.vm


[ 
https://issues.apache.org/jira/browse/YARN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181383#comment-14181383
 ] 

Hudson commented on YARN-2732:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1910 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1910/])
YARN-2732. Fixed syntax error in SecureContainer.apt.vm. Contributed by Jian 
He. (zjshen: rev b94b8b30f282563ee2ecdd25761b2345aaf06c9b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Fix syntax error in SecureContainer.apt.vm
 --

 Key: YARN-2732
 URL: https://issues.apache.org/jira/browse/YARN-2732
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2732.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry


[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181415#comment-14181415
 ] 

Steve Loughran commented on YARN-2678:
--

this is what a record now looks like
{code}
{
  type : JSONServiceRecord,
  description : Slider Application Master,
  external : [ {
api : org.apache.slider.appmaster,
addressType : host/port,
protocolType : hadoop/IPC,
addresses : [ {
  port : 48551,
  host : nn.example.com
} ]
  }, {
api : org.apache.http.UI,
addressType : uri,
protocolType : webui,
addresses : [ {
  uri : http://nn.example.com:40743;
} ]
  }, {
api : org.apache.slider.management,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : http://nn.example.com:40743/ws/v1/slider/mgmt;
} ]
  }, {
api : org.apache.slider.publisher,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : http://nn.example.com:40743/ws/v1/slider/publisher;
} ]
  }, {
api : org.apache.slider.registry,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : http://nn.example.com:40743/ws/v1/slider/registry;
} ]
  }, {
api : org.apache.slider.publisher.configurations,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : http://nn.example.com:40743/ws/v1/slider/publisher/slider;
} ]
  }, {
api : org.apache.slider.publisher.exports,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : http://nn.example.com:40743/ws/v1/slider/publisher/exports;
} ]
  } ],
  internal : [ {
api : org.apache.slider.agents.secure,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : https://nn.example.com:52705/ws/v1/slider/agents;
} ]
  }, {
api : org.apache.slider.agents.oneway,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : https://nn.example.com:33425/ws/v1/slider/agents;
} ]
  } ],
  yarn:persistence : application,
  yarn:id : application_1414052463672_0028
}
{code}

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran

 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on

[jira] [Commented] (YARN-2700) TestSecureRMRegistryOperations failing on windows: auth problems


[ 
https://issues.apache.org/jira/browse/YARN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181448#comment-14181448
 ] 

Hudson commented on YARN-2700:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1935 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1935/])
YARN-2700 TestSecureRMRegistryOperations failing on windows: auth problems 
(stevel: rev 90e5ca24fbd3bb2da2a3879cc9b73f0b1d7f3e03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/AbstractSecureRegistryTest.java
* hadoop-yarn-project/CHANGES.txt


 TestSecureRMRegistryOperations failing on windows: auth problems
 

 Key: YARN-2700
 URL: https://issues.apache.org/jira/browse/YARN-2700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
 Environment: Windows Server, Win7
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.6.0

 Attachments: YARN-2700-001.patch


 TestSecureRMRegistryOperations failing on windows: unable to create the root 
 /registry path with permissions problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2732) Fix syntax error in SecureContainer.apt.vm


[ 
https://issues.apache.org/jira/browse/YARN-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181445#comment-14181445
 ] 

Hudson commented on YARN-2732:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1935 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1935/])
YARN-2732. Fixed syntax error in SecureContainer.apt.vm. Contributed by Jian 
He. (zjshen: rev b94b8b30f282563ee2ecdd25761b2345aaf06c9b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Fix syntax error in SecureContainer.apt.vm
 --

 Key: YARN-2732
 URL: https://issues.apache.org/jira/browse/YARN-2732
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2732.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2692) ktutil test hanging on some machines/ktutil versions


[ 
https://issues.apache.org/jira/browse/YARN-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181443#comment-14181443
 ] 

Hudson commented on YARN-2692:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1935 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1935/])
YARN-2692 ktutil test hanging on some machines/ktutil versions (stevel) 
(stevel: rev 85a88649c3f3fb7280aa511b2035104bcef28a6f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/RegistryTestHelper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureLogins.java
* hadoop-yarn-project/CHANGES.txt


 ktutil test hanging on some machines/ktutil versions
 

 Key: YARN-2692
 URL: https://issues.apache.org/jira/browse/YARN-2692
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.6.0

 Attachments: YARN-2692-001.patch


 a couple of the registry security tests run native {{ktutil}}; this is 
 primarily to debug the keytab generation. [~cnauroth] reports that some 
 versions of {{kinit}} hang. Fix: rm the tests. [YARN-2689]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor


[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181440#comment-14181440
 ] 

Hudson commented on YARN-2198:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1935 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1935/])
YARN-2198. Remove the need to run NodeManager as privileged account for Windows 
Secure Container Executor. Contributed by Remus Rusanu (jianhe: rev 
3b12fd6cfbf4cc91ef8e8616c7aafa9de006cde5)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
* hadoop-common-project/hadoop-common/src/main/winutils/main.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-common-project/hadoop-common/src/main/winutils/hadoopwinutilsvc.idl
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/yarn/server/nodemanager/windows_secure_container_executor.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/yarn/server/nodemanager/windows_secure_container_executor.c
* hadoop-common-project/hadoop-common/src/main/native/native.vcxproj
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.sln
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcessTree.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* hadoop-common-project/hadoop-common/src/main/winutils/client.c
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.vcxproj
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* hadoop-common-project/hadoop-common/pom.xml
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* hadoop-common-project/hadoop-common/src/main/winutils/winutils.mc
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
* hadoop-common-project/hadoop-common/src/main/winutils/config.cpp
* .gitignore
* hadoop-common-project/hadoop-common/src/main/winutils/service.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java


 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu

[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue

2014-10-23 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181474#comment-14181474
 ] 

Sunil G commented on YARN-2647:
---

Thank you Wangda.

Sure. I will use the QueueInfo itself.

bq.  yarn queue -list short-queue-name or full-queue-name
Here, as you have mentioned the sub option will be passed only with a queue 
name. I do not expect a complete list command only tp print queue acls/node 
lables from all queues. Hope this is what you also expected.

Patch is coming in shape, and I will upload in a shortwhile from now.



 Add yarn queue CLI to get queue info including labels of such queue
 ---

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2183) Cleaner service for cache manager

2014-10-23 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181502#comment-14181502
]

Sangjin Lee commented on YARN-2183:
---

Thanks for the review [~kasha]!

{quote}
I understand we need a check to prevent the race. I wonder if we can just
re-use the existing check in CleanerTask#run instead of an explicit check in
CleanerService#runCleanerTask? From what I remember, that would make the code
in CleanerTask#run cleaner as well. (no pun)
{quote}

The main motivation for this somewhat elaborate double check was for a
situation when an on-demand cleaner run comes in and a scheduled cleaner run
starts. Without this check, we would have two cleaner runs back to back which
is somewhat wasteful.

Having said that, I think it is debatable how important it is to avoid that
situation and whether it is an optimization worth doing. One could argue that
this is bit too fine optimization. Thoughts?

{quote}
I poked around a little more, and here is what I think. SharedCacheManager
creates an instance of AppChecker, rest of the SCM pieces (Store,
CleanerService) should just use the same instance. This instance can be passed
either in the constructor or through an SCMContext similar to RMContext. Or, we
could add SCM#getAppChecker.
In its current form, CleanerTask#cleanResourceReferences fetches the references
from the store, checks if the apps are running, and asks the store to remove
the references. Moving the whole method to the store would simplify the code
more.
{quote}

Yes, I agree that moving cleanResourceReferences() to the store would simplify
code here. There is one caveat however. Currently
CleanerTask.cleanResourceReferences() is generic: i.e. it does not depend on
the type of the store. But if we move this to the store, then I think it would
need to be abstract at the level of SCMStore and each store implementation
would need to implement its own. The main reason is that the concurrency/safety
semantics would be different from store impl to store impl. In the case of the
in-memory store, it would use the synchronization on the interned key. But in
case of other stores, that does not apply and they need to do their own
implementation mostly because how they handle concurrency will be different. So
it would mean largely copying and pasting of the same logic with a small
difference of how they handle concurrency. That does seem to be a downside of
this approach. What do you think?

Cleaner service for cache manager
-

Key: YARN-2183
URL: https://issues.apache.org/jira/browse/YARN-2183
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch,
YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch

Implement the cleaner service for the cache manager along with metrics for
the service. This service is responsible for cleaning up old resource
references in the manager and removing stale entries from the cache.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle

2014-10-23 Thread Stephen Chu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181560#comment-14181560
 ] 

Stephen Chu commented on YARN-2722:
---

Hi [~ywskycn], thanks for making this change. Java 6 doesn't support TLSv1.2. 
Robert noted this in HADOOP-11217 as well. Should we be adding TLSv1.2 in this 
patch? 

 Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
 -

 Key: YARN-2722
 URL: https://issues.apache.org/jira/browse/YARN-2722
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2722-1.patch


 We should disable SSLv3 in HttpFS to protect against the POODLEbleed 
 vulnerability.
 See [CVE-2014-3566 
 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566]
 We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when 
 I checked, I could still connect with SSLv3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-10-23 Thread Sumit Kumar (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181582#comment-14181582
]

Sumit Kumar commented on YARN-2505:
---

bq. I would think that should be a post with a (list of) label(s) and a list of
node ids.
I think it would be enough to provide support for applying a label on a list of
node ids. From use case perspective, such a labeling should mean categorizing
certain nodes in a group. May be i do not see much usecase for putting multiple
nodes in multiple groups at the same time. If at all such a complicated case
arises, users should make multiple calls with single label and a list of nodes
each time.

bq. I don't think there's a compelling purpose at the moment for a node label
type, it's a string/textual label and I think it is sensible to just model it
as such.
I agree with you. Given that we already have support for _applicationTags_,
there is no immediate need for _type_ for a label. Though at some point of time
we should merge _applicationTags_ and _label_ features into one. What do you
think?

Support get/add/remove/change labels in RM REST API
---

Key: YARN-2505
URL: https://issues.apache.org/jira/browse/YARN-2505
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch,
YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file

2014-10-23 Thread Sumit Mohanty (JIRA)

Sumit Mohanty created YARN-2734:
---

 Summary: If a sub-folder is encountered by log aggregator it 
results in invalid aggregated file
 Key: YARN-2734
 URL: https://issues.apache.org/jira/browse/YARN-2734
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
 Fix For: 2.6.0


See YARN-2724 for some more context on how the error surfaces during yarn 
logs call.

If aggregator sees a sub-folder today it results in the following error when 
reading the logs:

{noformat}
Container: container_1413512973198_0019_01_02 on 
c6401.ambari.apache.org_45454

LogType: cmd_data
LogLength: 4096
Log Contents:
Error aggregating log file. Log file : 
/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data
 (Is a directory)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle

2014-10-23 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181588#comment-14181588
 ] 

Wei Yan commented on YARN-2722:
---

Thanks, [~schu]. You're right, we shouldn't add TLSv1.2. And according to this 
jdk document: 
https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https.
 JDK6 actually only supports TLSv1. I verified in a cluster that TLSv1.1 should 
also be removed when using jdk 6. Will confirm with Robert later.

 Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
 -

 Key: YARN-2722
 URL: https://issues.apache.org/jira/browse/YARN-2722
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2722-1.patch


 We should disable SSLv3 in HttpFS to protect against the POODLEbleed 
 vulnerability.
 See [CVE-2014-3566 
 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566]
 We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when 
 I checked, I could still connect with SSLv3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file


 [ 
https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-2734:
---

Assignee: Xuan Gong

 If a sub-folder is encountered by log aggregator it results in invalid 
 aggregated file
 --

 Key: YARN-2734
 URL: https://issues.apache.org/jira/browse/YARN-2734
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Fix For: 2.6.0


 See YARN-2724 for some more context on how the error surfaces during yarn 
 logs call.
 If aggregator sees a sub-folder today it results in the following error when 
 reading the logs:
 {noformat}
 Container: container_1413512973198_0019_01_02 on 
 c6401.ambari.apache.org_45454
 
 LogType: cmd_data
 LogLength: 4096
 Log Contents:
 Error aggregating log file. Log file : 
 /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data
  (Is a directory)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


 [ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2724:

Attachment: YARN-2724.4.patch

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor


[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181616#comment-14181616
 ] 

Jian He commented on YARN-2701:
---

lgtm too, thanks Binglin and Zhihai for reviewing the patch 

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, 
 YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, 
 YARN-2701.addendum.3.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

[
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181646#comment-14181646
]

Wangda Tan commented on YARN-2495:
--

1)
bq. But was thinking about one sceanario labels got changed and on call to
NodeLabelsProvider.getLabels() it returns the new labels but the heartbeat
failed due to some reason.
If heartbeat failed, the resource tracker in NM side cannot get
NodeHeartbeatResponse. But I'm thinking another case is, labels reported by NMs
can be invalid and rejected by RM. NM should be notified about such cases.

So I would suggest do this way:
- Keep getNodeLabels in NodeHeartbeatRequest and RegisterNodeManagerRequest.
- Add a reject node labels list in NodeHeartbeatRequest -- we may not have to
handle this list for now. But we can keep it on the interface
- Add a lastNodeLabels in NodeStatusUpdater, it will save last node labels
list get from NodeLabelFetcher. And in the while loop of
{{startStatusUpdater}}, we will check if the new list fetched from
NodeLabelFetcher is different from our last node labels list. If different, we
will set it, if same, we will skip and set the labels to be null in next
heartbeat.

And the interface of NodeLabelsProvider should be simple, just a
getNodeLabels(), NodeStatusUpdater will take care other stuffs.

2)
bq. and for If it's distributed, AdminService should disable admin change
labels on nodes via RM admin CLI will add a jira, but was wondering how to do
this ? by configuration with new parameter?
Yes, we should add a new parameter for it, we may not need have this
immediately, but we should have one in the future.

bq. I was earlier under the impression as MemoryRMNodeLabelsManager = is for
distributed Configuration and RMNodeLabelsManager is for Centrallized
configuration. and some factory will take care of this
Not really, the different between them is one will persist labels to filesystem
and one not. We still have to do something for the distributed configuration.

Any thoughts? [~vinodkv]

Thanks,
Wangda

Allow admin specify labels in each NM (Distributed configuration)
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue


[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181661#comment-14181661
 ] 

Wangda Tan commented on YARN-2647:
--

Hi [~sunilg],
Maybe my previous comment make you confused, what in my mind is,

{code}
yarn queue -list (or -liststatus):
OUTPUT:

root:
ACL:
Labels: LINUX, LARGE_MEM
Status: RUNNING
Capacity: 80%
...

root.queue-a:
ACL:
Labels: LINUX, ...
...
{code}

{code}
yarn queue -list root.queueA

OUTPUT:
root.queue-a:
ACL:
Labels: LINUX, ...
Capacity: 80%
...
{code}

{code}
yarn queue -list root.queueA -show-node-label

OUTPUT:
root.queue-a:
ACL:
Labels: LINUX, ...
END
{code}

Does this make sense to you? Or do you have any other suggestions?

Thanks,
Wangda

 Add yarn queue CLI to get queue info including labels of such queue
 ---

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file


[ 
https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181664#comment-14181664
 ] 

Xuan Gong commented on YARN-2734:
-

Currently, if the current path is a sub-folder, we will throw an IOException. 
Instead of exception, we should check explicitly to skip sub-dirs.

 If a sub-folder is encountered by log aggregator it results in invalid 
 aggregated file
 --

 Key: YARN-2734
 URL: https://issues.apache.org/jira/browse/YARN-2734
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Fix For: 2.6.0


 See YARN-2724 for some more context on how the error surfaces during yarn 
 logs call.
 If aggregator sees a sub-folder today it results in the following error when 
 reading the logs:
 {noformat}
 Container: container_1413512973198_0019_01_02 on 
 c6401.ambari.apache.org_45454
 
 LogType: cmd_data
 LogLength: 4096
 Log Contents:
 Error aggregating log file. Log file : 
 /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data
  (Is a directory)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed

2014-10-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181665#comment-14181665
 ] 

Zhijie Shen commented on YARN-2724:
---

+1 for the latest patch. Will commit it later today to give [~mitdesai] and 
[~vinodkv] a chance to look at it.

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity


 [ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2726:


Assignee: Wangda Tan  (was: Naganarasimha G R)

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor

 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity


[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181673#comment-14181673
 ] 

Wangda Tan commented on YARN-2726:
--

Taking this over..

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor

 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle

2014-10-23 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181677#comment-14181677
 ] 

Wei Yan commented on YARN-2722:
---

Hi, [~schu]. Discussed with Robert offline, and we also need to remove TLSv1.1. 
only support TLSv1.

 Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
 -

 Key: YARN-2722
 URL: https://issues.apache.org/jira/browse/YARN-2722
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2722-1.patch


 We should disable SSLv3 in HttpFS to protect against the POODLEbleed 
 vulnerability.
 See [CVE-2014-3566 
 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566]
 We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when 
 I checked, I could still connect with SSLv3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2183) Cleaner service for cache manager


[ 
https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181684#comment-14181684
 ] 

Karthik Kambatla commented on YARN-2183:


bq. Without this check, we would have two cleaner runs back to back which is 
somewhat wasteful.
I don't entirely remember my train of thought, but I can take a look again and 
see if we can implement it in a simpler way and get the same guarantee. May be, 
after the next patch. 

bq. Currently CleanerTask.cleanResourceReferences() is generic: i.e. it does 
not depend on the type of the store.
Can we keep this method as is, but mark it protected. The store implementations 
can choose to use it.

 Cleaner service for cache manager
 -

 Key: YARN-2183
 URL: https://issues.apache.org/jira/browse/YARN-2183
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, 
 YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch


 Implement the cleaner service for the cache manager along with metrics for 
 the service. This service is responsible for cleaning up old resource 
 references in the manager and removing stale entries from the cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed

2014-10-23 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181688#comment-14181688
 ] 

Mit Desai commented on YARN-2724:
-

I'll take a look shortly.

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181690#comment-14181690
 ] 

Hadoop QA commented on YARN-2724:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676633/YARN-2724.4.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5518//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5518//console

This message is automatically generated.

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue

2014-10-23 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181701#comment-14181701
 ] 

Sunil G commented on YARN-2647:
---

Thank you [~gp.leftnoteasy]

Yes. This is more or like what I also have in mind.

I have a point here.

{code}
yarn queue -list -show-node-label
{code}

I do not feel above config is needed to show node lables only for all queues.
Here I will anyway show complete queue details of all queues.

Also as you have displayed, row based display is better as we have variable 
number of config items for Node Labels. My initial approach was column based, 
which will cause breakage of line often. Current display which you have shown 
makes more sense, and I will be using the same and will be making changes in my 
patch now.

 Add yarn queue CLI to get queue info including labels of such queue
 ---

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2473) YARN never cleans up container directories from a full disk

2014-10-23 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-2473.
--
Resolution: Duplicate

Closing as a duplicate of YARN-90.

 YARN never cleans up container directories from a full disk
 ---

 Key: YARN-2473
 URL: https://issues.apache.org/jira/browse/YARN-2473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Varun Vasudev
Priority: Blocker

 After YARN-1781 when a container ends up filling a local disk the nodemanager 
 will mark it as a bad disk and remove it from the list of good local dirs.  
 When the container eventually completes the files that filled the disk will 
 not be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry

2014-10-23 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181709#comment-14181709
 ] 

Gour Saha commented on YARN-2678:
-

Steve it looks good. On the addresses front, do you plan to expose host and 
port attributes in addition to uri (show below)? Clients can avoid parsing.

{noformat}
...
  internal : [ {
api : org.apache.slider.agents.secure,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : https://nn.example.com:52705/ws/v1/slider/agents;,
  host : nn.example.com,
  port : 52705
} ]
  }, {
api : org.apache.slider.agents.oneway,
addressType : uri,
protocolType : REST,
addresses : [ {
  uri : https://nn.example.com:33425/ws/v1/slider/agents;,
  host : nn.example.com,
  port : 33425
} ]
  } ],
...
{noformat}


 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran

 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue


[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181717#comment-14181717
 ] 

Wangda Tan commented on YARN-2647:
--

bq. I do not feel above config is needed to show node lables only for all 
queues.
Make sense, the basic functionality should let user get queue statuses, user 
doesn't need get only node-label/ACL
So the command line should be
yarn queue -list queue-name or queue-path, if user doesn't specify queue 
name, all queues' statuses will be printed.


 Add yarn queue CLI to get queue info including labels of such queue
 ---

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2183) Cleaner service for cache manager

2014-10-23 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181723#comment-14181723
 ] 

Sangjin Lee commented on YARN-2183:
---

You mean, moving the method to SCMStore but mark it protected? If so, for 
CleanerTask to be able to call it, it cannot be protected, right? One thing we 
can do is to move it to SCMStore as a public method, but let implementations 
override/augment it.

 Cleaner service for cache manager
 -

 Key: YARN-2183
 URL: https://issues.apache.org/jira/browse/YARN-2183
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, 
 YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch


 Implement the cleaner service for the cache manager along with metrics for 
 the service. This service is responsible for cleaning up old resource 
 references in the manager and removing stale entries from the cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity


 [ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2726:
-
Attachment: YARN-2726-20141023-1.patch

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2726-20141023-1.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY


 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2694:
-
Summary: Ensure only single node labels specified in resource request / 
host, and node label expression only specified when resourceName=ANY  (was: 
Ensure only single node labels specified in resource request, and node label 
expression only specified when resourceName=ANY)

 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest with multiple 
 node labels will make user limit computation becomes tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-10-23 Thread zhihai xu (JIRA)

zhihai xu created YARN-2735:
---

 Summary: diskUtilizationPercentageCutoff and 
diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection
 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized 
twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

[
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wangda Tan updated YARN-2694:
-
Description:
Currently, node label expression supporting in capacity scheduler is partial
completed. Now node label expression specified in Resource Request will only
respected when it specified at ANY level. And a ResourceRequest/host with
multiple node labels will make user limit, etc. computation becomes more tricky.

Now we need temporarily disable them, changes include,
- AMRMClient
- ApplicationMasterService
- RMAdminCLI
- CommonNodeLabelsManager

was:
Currently, node label expression supporting in capacity scheduler is partial
completed. Now node label expression specified in Resource Request will only
respected when it specified at ANY level. And a ResourceRequest with multiple
node labels will make user limit computation becomes tricky.

Now we need temporarily disable them, changes include,
- AMRMClient
- ApplicationMasterService

Ensure only single node labels specified in resource request / host, and node
label expression only specified when resourceName=ANY
---

Key: YARN-2694
URL: https://issues.apache.org/jira/browse/YARN-2694
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch

Currently, node label expression supporting in capacity scheduler is partial
completed. Now node label expression specified in Resource Request will only
respected when it specified at ANY level. And a ResourceRequest/host with
multiple node labels will make user limit, etc. computation becomes more
tricky.
Now we need temporarily disable them, changes include,
- AMRMClient
- ApplicationMasterService
- RMAdminCLI
- CommonNodeLabelsManager

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-10-23 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2735:

Attachment: YARN-2735.000.patch

 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection
 ---

 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2735.000.patch


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-10-23 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181764#comment-14181764
 ] 

zhihai xu commented on YARN-2735:
-

I attached a patch to remove the unnecessary initialization for  
diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff

 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection
 ---

 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2735.000.patch


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2183) Cleaner service for cache manager


[ 
https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181772#comment-14181772
 ] 

Karthik Kambatla commented on YARN-2183:


Yes. Sorry for the confusion. Just looked at the code again. My suggestion is 
to move cleanResourceReferences to SCMStore and mark it @Private public final. 
Does that make sense? 

 Cleaner service for cache manager
 -

 Key: YARN-2183
 URL: https://issues.apache.org/jira/browse/YARN-2183
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, 
 YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch


 Implement the cleaner service for the cache manager along with metrics for 
 the service. This service is responsible for cleaning up old resource 
 references in the manager and removing stale entries from the cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)

2014-10-23 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181771#comment-14181771
 ] 

Naganarasimha G R commented on YARN-2495:
-

hi [~wangda]
Actually what i meant was update the HeartBeatResponse abt the labels 
acceptance by RM and once NodeStatusUpdater gets response(+ve or -ve) from RM 
then it can set LabelsProvider with approp flag. But your logic seems to be 
much better because i was handling thread sync unnecessarly in 
ConfNodeLabelsProvider. Having this logic in Node status updater removes the 
burden of each type of NodeLabelsProvider to have this sync logic and interface 
will be simple in   NodeLabelsProvider (earlier my thinking was labels should 
not be handled by NodeStatusUpdater hence kept in nodeLabelsprovider)
Actually was about the upload the patch with my logic, as its not as per your 
latest comments i will upload one more by tomorrow afternoon(IST) after 
correction as per your comments 

bq. Add a reject node labels list in NodeHeartbeatRequest – we may not have to 
handle this list for now. But we can keep it on the interface
you meant NodeHeartBeatResponse right ?

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2736) Job.getHistoryUrl returns empty string

2014-10-23 Thread Kannan Rajah (JIRA)

Kannan Rajah created YARN-2736:
--

 Summary: Job.getHistoryUrl returns empty string
 Key: YARN-2736
 URL: https://issues.apache.org/jira/browse/YARN-2736
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Kannan Rajah
Priority: Critical


getHistoryUrl() method in Job class is returning empty string. Example code:

job = Job.getInstance(conf);
job.setJobName(MapReduceApp);
job.setJarByClass(MapReduceApp.class);

job.setMapperClass(Mapper1.class);
job.setCombinerClass(Reducer1.class);
job.setReducerClass(Reducer1.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setNumReduceTasks(1);

job.setOutputFormatClass(TextOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);

FileInputFormat.addInputPath(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);

 job.waitForCompletion(true);
 job.getHistoryUrl();

It is always returning empty string. Looks like getHistoryUrl() support was 
removed in YARN-321.

getTrackingURL() returns correct url though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display

2014-10-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181791#comment-14181791
 ] 

Zhijie Shen commented on YARN-2703:
---

[~xgong], thanks for the patch. Some comments about it.

1. It's better to write timestamp directly. When reading it, it's flexible to 
convert it into the desired format we what.
{code}
// Write the uploaded TimeStamp
out.writeUTF(Times.format(uploadedTime));
{code}

2. Is it necessary to sort the files? The goal here is add the timestamp to the 
same log file name uploaded in different iterations. The following is make the 
order of the uploaded files in the same iteration changed? Previously, it‘s 
alphabetical, while now it's chronological. For example, stderr1 - stdout1 - 
stderr2 - stdout2 will be changed to stderr1 - stdout1 - stdout2 - stderr2, 
which may not be a better order.
{code}
  // sort the file by lastModfiedTime.
  ListFile candidatesList = new ArrayListFile(candidates);
  Collections.sort(candidatesList, new ComparatorFile() {
public int compare(File s1, File s2) {
  return s1.lastModified()  s2.lastModified() ? -1
  : s1.lastModified()  s2.lastModified() ? 1 : 0;
}
  });
  return candidatesList;
{code}

3. No need to ask caller to pass in the uploaded time. We can directly execute 
{{out.writeLong(System.currentTimeMillis());}}
{code}
public void write(DataOutputStream out, SetFile pendingUploadFiles,
long uploadedTime) throws IOException {
{code}

4. Can you correct the log message bellow in TestLogAggregationService, and add 
logTime as well?
{code}

LOG.info(LogType: + fileType);
LOG.info(LogType: + fileLength);
{code}

 Add logUploadedTime into LogValue for better display
 

 Key: YARN-2703
 URL: https://issues.apache.org/jira/browse/YARN-2703
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2703.1.patch, YARN-2703.2.patch


 Right now, the container can upload its logs multiple times. Sometimes, 
 containers write different logs into the same log file.  After the log 
 aggregation, when we query those logs, it will show:
 LogType: stderr
 LogContext:
 LogType: stdout
 LogContext:
 LogType: stderr
 LogContext:
 LogType: stdout
 LogContext:
 The same files could be displayed multiple times. But we can not figure out 
 which logs come first. We could add extra loguploadedTime to let users have 
 better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-10-23 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.7.patch

Add forgotten generic type definitions, should fix javac warnings...

 Support get/add/remove/change labels in RM REST API
 ---

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
 YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


 [ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2724:

Attachment: YARN-2724.5.patch

Same patch

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch, YARN-2724.5.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-10-23 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181798#comment-14181798
 ] 

Craig Welch commented on YARN-2505:
---

-re I think it would be enough to provide support for applying a label on a 
list of node ids

Fair enough - I was thinking of the suggested api as a superset of this, but 
maybe this is all we really need.  I like the idea, not sure I can get to it 
just now - I'll see, if not, perhaps we can do a followon jira for it - let's 
see

-re Though at some point of time we should merge applicationTags and label 
features into one. What do you think?

I'm not sure actually, there are clearly some similarities, but I think they 
are distinct things...

 Support get/add/remove/change labels in RM REST API
 ---

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
 YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-10-23 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181809#comment-14181809
]

Jason Lowe commented on YARN-2314:
--

bq. IIUC, mayBeCloseProxy can be invoked by MR/NMClient, but
proxy.scheduledForClose is always false. So it won’t call the following
stopProxy.

proxy.scheduledForClose is not always false, as it can be set to true by
removeProxy. removeProxy is called by the cache when an entry needs to be
evicted from the cache. If the cache never fills then we never will call
removeProxy by the very design of the cache. This patch doesn't change the
behavior in that sense. I suppose we could change the patch so that it only
caches the proxy objects but not their underlying connections. However I have
my doubts that's where the real expense is in creating the proxy -- it's much
more likely to be establishing the RPC connection to the NM.

bq. once ContainerManagementProtocolProxy#tryCloseProxy is called, internally
it’ll call rpc.stopProxy, will it eventually call ClientCache#stopClient

ClientCache#stopClient will not necessarily shut down the connection. It will
only shutdown the connection if there are no references to the protocol by any
other objects, but the very nature of the ContainerManagementProtocolProxy
cache is to keep around references. Therefore stopClient will never actually
do anything in practice as long as we are caching proxy objects. That's why I
mentioned earlier that the RPC layer itself needs to change to add the ability
to shutdown connections or change the way the ClientCache behaves to really fix
this if we want to continue to cache proxy objects at a higher layer.

ContainerManagementProtocolProxy can create thousands of threads for a large
cluster

Key: YARN-2314
URL: https://issues.apache.org/jira/browse/YARN-2314
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
Attachments: YARN-2314.patch, YARN-2314v2.patch,
disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch,
tez-yarn-2314.xlsx

ContainerManagementProtocolProxy has a cache of NM proxies, and the size of
this cache is configurable. However the cache can grow far beyond the
configured size when running on a large cluster and blow AM address/container
limits. More details in the first comment.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection


[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181814#comment-14181814
 ] 

Hadoop QA commented on YARN-2735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676677/YARN-2735.000.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5520//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5520//console

This message is automatically generated.

 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection
 ---

 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2735.000.patch


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181838#comment-14181838
 ] 

Hadoop QA commented on YARN-2724:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676687/YARN-2724.5.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5522//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5522//console

This message is automatically generated.

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch, YARN-2724.5.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181859#comment-14181859
 ] 

Wangda Tan commented on YARN-2495:
--

Hi Naga,
bq. you meant NodeHeartBeatResponse right ?
Yes

Looking forward your patch.

Wangda

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster


[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181869#comment-14181869
 ] 

Jian He commented on YARN-2314:
---

Jason,  thanks for your explanation.
bq. If the cache never fills then we never will call removeProxy by the very 
design of the cache.
I was thinking the client could  have a way to explicitly stopProxy and remove 
the entry from the cache, rather than remove the entry only if it hits the 
cache limit.  But looks like this is by design. And yes, this is the existing 
behavior.

 ContainerManagementProtocolProxy can create thousands of threads for a large 
 cluster
 

 Key: YARN-2314
 URL: https://issues.apache.org/jira/browse/YARN-2314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-2314.patch, YARN-2314v2.patch, 
 disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch, 
 tez-yarn-2314.xlsx


 ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
 this cache is configurable.  However the cache can grow far beyond the 
 configured size when running on a large cluster and blow AM address/container 
 limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API


[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181908#comment-14181908
 ] 

Hadoop QA commented on YARN-2505:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676686/YARN-2505.7.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5521//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5521//console

This message is automatically generated.

 Support get/add/remove/change labels in RM REST API
 ---

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
 YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181929#comment-14181929
 ] 

Jian He commented on YARN-2209:
---

bq. No need to change to split a single statement into the following two.
This is required, because the following finally block needs this temporary 
variable. 
bq. Why does it not need to take the remaining operations after code change?
Because the allocate throwing exception, the response object is empty
bq. Is the change in ResourceCalculator.java related?
It's causing excessive loggings in production cluster, I intentionally  removed 
them.

 Replace AM resync/shutdown command with corresponding exceptions
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
 YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY


 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2694:
-
Attachment: YARN-2694-20141023-1.patch

Updated patch

 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
 YARN-2694-20141023-1.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest/host with 
 multiple node labels will make user limit, etc. computation becomes more 
 tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService
 - RMAdminCLI
 - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


 [ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2209:
--
Attachment: YARN-2209.7.patch

Thanks zhijie for the review! addressed other comments

 Replace AM resync/shutdown command with corresponding exceptions
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
 YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch, 
 YARN-2209.7.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2737) Misleading msg in LogCLI when app is not successfully submitted

Jian He created YARN-2737:
-

 Summary: Misleading msg in LogCLI when app is not successfully 
submitted 
 Key: YARN-2737
 URL: https://issues.apache.org/jira/browse/YARN-2737
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He


{{LogCLiHelpers#logDirNotExist}} prints msg {{Log aggregation has not completed 
or is not enabled.}} if the app log file doesn't exist. This is misleading 
because if the application is not submitted successfully. Clearly, we won't 
have logs for this application. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2703) Add logUploadedTime into LogValue for better display


 [ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2703:

Attachment: YARN-2703.3.patch

 Add logUploadedTime into LogValue for better display
 

 Key: YARN-2703
 URL: https://issues.apache.org/jira/browse/YARN-2703
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch


 Right now, the container can upload its logs multiple times. Sometimes, 
 containers write different logs into the same log file.  After the log 
 aggregation, when we query those logs, it will show:
 LogType: stderr
 LogContext:
 LogType: stdout
 LogContext:
 LogType: stderr
 LogContext:
 LogType: stdout
 LogContext:
 The same files could be displayed multiple times. But we can not figure out 
 which logs come first. We could add extra loguploadedTime to let users have 
 better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display


[ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181998#comment-14181998
 ] 

Xuan Gong commented on YARN-2703:
-

Addressed all comments

 Add logUploadedTime into LogValue for better display
 

 Key: YARN-2703
 URL: https://issues.apache.org/jira/browse/YARN-2703
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch


 Right now, the container can upload its logs multiple times. Sometimes, 
 containers write different logs into the same log file.  After the log 
 aggregation, when we query those logs, it will show:
 LogType: stderr
 LogContext:
 LogType: stdout
 LogContext:
 LogType: stderr
 LogContext:
 LogType: stdout
 LogContext:
 The same files could be displayed multiple times. But we can not figure out 
 which logs come first. We could add extra loguploadedTime to let users have 
 better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

[
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182004#comment-14182004
]

Hadoop QA commented on YARN-2694:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12676721/YARN-2694-20141023-1.patch
against trunk revision 828429d.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5523//console

This message is automatically generated.

Ensure only single node labels specified in resource request / host, and node
label expression only specified when resourceName=ANY
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry


 [ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2678:
-
Attachment: yarnregistry.pdf

Updated TLA specification;
# covers new structure
# declares that {{serialize()}} and {{deserialize()}} functions exist to go 
from {{ServiceRecord}} instances to record data (strings), as well as a 
{{containsValidServiceRecord()}} predicate to check whether or not a string 
contains a service record. This lets us define the record{{-}}data 
marshalling behaviour without covering the implementation details

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY


 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2694:
-
Attachment: YARN-2694-20141023-2.patch

I can compile this locally, I haven't found any error message in the console 
log of Jenkins result. So I suspect it just Jenkins process crashed. Resubmit 
same patch.

 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
 YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest/host with 
 multiple node labels will make user limit, etc. computation becomes more 
 tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService
 - RMAdminCLI
 - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity


[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182018#comment-14182018
 ] 

Hadoop QA commented on YARN-2726:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676662/YARN-2726-20141023-1.patch
  against trunk revision d71d40a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.fs.permission.TestStickyBitTTests
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrationTeTests
org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgTestsTests
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolTestsTests
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporTestsTests
org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCaTestTests
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRTestsTests
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeMaTests
org.apache.hadoop.hdfs.TestEncryptionZonesWiTests
org.apache.hadoop.hdfs.TestDFSClientRetrTestTests
org.apache.hadoop.hdfs.TestFileCreaTestsTests
org.apache.hadoop.hdfs.TestDatanodeTests
org.apache.hadoop.hdfs.TestLeaseReTests
org.apache.hadoop.hdfs.TestDatanodeBlockScTests
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithTests
org.apache.hadoop.hdfs.TestGetTests
org.apache.hadoop.tracing.TestTraceAdmin

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5519//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5519//console

This message is automatically generated.

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2726-20141023-1.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry


 [ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2678:
-
Attachment: YARN-2678-001.patch

Updated patch
# marshalling to ZK node without header, but check for type string performed 
before any attempt to parse the content
# maps used to define addresses
# updated doc to match (with full example generated off live AM)

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: YARN-2678-001.patch, yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry


[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182022#comment-14182022
 ] 

Steve Loughran commented on YARN-2678:
--

Gour: I don't want to split out hostname and port from a URI. Parsing URLs is 
ubiquitous, every language has a toolkit to do it. Mandating that they must be 
separate only creates the possibility of conflicting values between the {{uri}} 
field and the explicit ones.

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: YARN-2678-001.patch, yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity


 [ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2726:
-
Attachment: YARN-2726-20141023-2.patch

Jenkins issue resubmit patch

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle

2014-10-23 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2722:
--
Attachment: YARN-2722-2.patch

 Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
 -

 Key: YARN-2722
 URL: https://issues.apache.org/jira/browse/YARN-2722
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2722-1.patch, YARN-2722-2.patch


 We should disable SSLv3 in HttpFS to protect against the POODLEbleed 
 vulnerability.
 See [CVE-2014-3566 
 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566]
 We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when 
 I checked, I could still connect with SSLv3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry


 [ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2678:
-
Attachment: HADOOP-2678-002.patch

Patch merging in YARN-2677  patch which is forgiving of non-DNS entries in the 
path (like usernames)

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry

2014-10-23 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182074#comment-14182074
 ] 

Gour Saha commented on YARN-2678:
-

bq. Gour: I don't want to split out hostname and port from a URI. Parsing URLs 
is ubiquitous, every language has a toolkit to do it. Mandating that they must 
be separate only creates the possibility of conflicting values between the uri 
field and the explicit ones.

Ok makes sense. Slider agents are doing it today anyway.

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle


[ 
https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182094#comment-14182094
 ] 

Hadoop QA commented on YARN-2722:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676744/YARN-2722-2.patch
  against trunk revision 828429d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5527//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5527//console

This message is automatically generated.

 Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
 -

 Key: YARN-2722
 URL: https://issues.apache.org/jira/browse/YARN-2722
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2722-1.patch, YARN-2722-2.patch


 We should disable SSLv3 in HttpFS to protect against the POODLEbleed 
 vulnerability.
 See [CVE-2014-3566 
 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566]
 We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when 
 I checked, I could still connect with SSLv3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry


[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182095#comment-14182095
 ] 

Hadoop QA commented on YARN-2678:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676750/HADOOP-2678-002.patch
  against trunk revision 828429d.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5529//console

This message is automatically generated.

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2056) Disable preemption at Queue level

2014-10-23 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2056:
-
Attachment: YARN-2056.201410232244.txt

Thanks very much [~leftnoteasy]. I have attached a patch which uses 
PriorityQueue instead of an internal queue class.

Please note that since the algorithm for building up needy queues is different, 
the rounding is also different, so some of the tests' expected values needed to 
change. I stepped through several of the tests and they seem to be working as I 
expect.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
 YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
 YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, 
 YARN-2056.201410232244.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-10-23 Thread Matteo Mazzucchelli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182112#comment-14182112
 ] 

Matteo Mazzucchelli commented on YARN-2664:
---

I notice that the data that are sent to the html page are in a cvs format and 
that most of them are zero. I think that the best way to handle these data 
would be to send only the important (non-zero) values in a json.
{code:java}
[ 
{
key: reservation_1413792787395_0018,
values : [{date: Mon Oct 24 10:13:37 CEST 2014, value: 0},
{date: Mon Oct 24 10:14:18 CEST 2014, value: 5}] 
},
...
]
{code}
Therefore, only the timestamp and the value associated to it will be sent.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182124#comment-14182124
 ] 

Hadoop QA commented on YARN-2209:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676728/YARN-2209.7.patch
  against trunk revision 828429d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5524//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5524//console

This message is automatically generated.

 Replace AM resync/shutdown command with corresponding exceptions
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
 YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch, 
 YARN-2209.7.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2183) Cleaner service for cache manager

2014-10-23 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2183:
--
Attachment: YARN-2183-trunk-v6.patch

v.6 patch posted.

To see the patch in context, go to 
https://github.com/ctrezzo/hadoop/compare/apache:trunk...sharedcache-3-YARN-2183-cleaner

To see the changes between v.5 and v.6, go to 
https://github.com/ctrezzo/hadoop/commit/ffcc098749d16950732d833141db356efe116ed3

 Cleaner service for cache manager
 -

 Key: YARN-2183
 URL: https://issues.apache.org/jira/browse/YARN-2183
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, 
 YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, 
 YARN-2183-trunk-v6.patch


 Implement the cleaner service for the cache manager along with metrics for 
 the service. This service is responsible for cleaning up old resource 
 references in the manager and removing stale entries from the cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed

2014-10-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182133#comment-14182133
 ] 

Zhijie Shen commented on YARN-2724:
---

[~mitdesai], any comments so far?

 If an unreadable file is encountered during log aggregation then aggregated 
 file in HDFS badly formed
 -

 Key: YARN-2724
 URL: https://issues.apache.org/jira/browse/YARN-2724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, 
 YARN-2724.4.patch, YARN-2724.5.patch


 Look into the log output snippet. It looks like there is an issue during 
 aggregation when an unreadable file is encountered. Likely, this results in 
 bad encoding.
 {noformat}
 LogType: command-13.json
 LogLength: 13934
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
   
 errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 
 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
 sys=0.01, real=0.05 secs]
 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: 
 [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 
 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 
 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 
 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
 real=0.04 secs]
 {noformat}
 Specifically, look at the text after the exception text. There should be two 
 more entries for log files but none exist. This is likely due to the fact 
 that command-13.json is expected to be of length 13934 but its is not as the 
 file was never read.
 I think, it should have been
 {noformat}
 LogType: command-13.json
 LogLength: Length of the exception text
 Log Contents:
 Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json
  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
 /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json
  (Permission denied)
 {noformat}
 {noformat}
 LogType: errors-3.txt
 LogLength:0
 Log Contents:
 {noformat}
 {noformat}
 LogType:gc.log
 LogLength:???
 Log Contents:
 ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: 
 [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again


[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182135#comment-14182135
 ] 

Karthik Kambatla commented on YARN-2010:


[~jianhe] - can you please verify the changes to TestWorkPreservingRMRestart 
are reasonable. 

 If RM fails to recover an app, it can never transition to active again
 --

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, 
 issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, 
 yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch


 Sometimes, the RM fails to recover an application. It could be because of 
 turning security on, token expiry, or issues connecting to HDFS etc. The 
 causes could be classified into (1) transient, (2) specific to one 
 application, and (3) permanent and apply to multiple (all) applications. 
 Today, the RM fails to transition to Active and ends up in STOPPED state and 
 can never be transitioned to Active again.
 The initial stacktrace reported is at 
 https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2010) If RM fails to recover an app, it can never transition to active again


 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---
Attachment: yarn-2010-5.patch

Updated patch to fix test failure, findbugs warning, and suppress javac 
warnings (we call getEventHandler().handle() at several other places, I don't 
quite get why it leads to a javac warning only here).

 If RM fails to recover an app, it can never transition to active again
 --

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, 
 issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, 
 yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch


 Sometimes, the RM fails to recover an application. It could be because of 
 turning security on, token expiry, or issues connecting to HDFS etc. The 
 causes could be classified into (1) transient, (2) specific to one 
 application, and (3) permanent and apply to multiple (all) applications. 
 Today, the RM fails to transition to Active and ends up in STOPPED state and 
 can never be transitioned to Active again.
 The initial stacktrace reported is at 
 https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY


[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182142#comment-14182142
 ] 

Hadoop QA commented on YARN-2694:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676740/YARN-2694-20141023-2.patch
  against trunk revision 828429d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5526//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5526//console

This message is automatically generated.

 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
 YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest/host with 
 multiple node labels will make user limit, etc. computation becomes more 
 tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService
 - RMAdminCLI
 - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity


[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182178#comment-14182178
 ] 

Hadoop QA commented on YARN-2726:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12676745/YARN-2726-20141023-2.patch
  against trunk revision 828429d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5528//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5528//console

This message is automatically generated.

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2713) Broken RM Home link in NM Web UI when RM HA is enabled


 [ 
https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2713:
---
Attachment: yarn-2713-1.patch

Here is a straight-forward patch that points RM Home to the first RM in a HA 
deployment. If the first RM is not Active, it will redirect to the Active 
automatically. I am not sure if we want a more sophisticated fix that would 
take us to the Active directly.

 Broken RM Home link in NM Web UI when RM HA is enabled
 

 Key: YARN-2713
 URL: https://issues.apache.org/jira/browse/YARN-2713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2713-1.patch


 When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It 
 points to the NM-host:RM-port instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2713) Broken RM Home link in NM Web UI when RM HA is enabled


 [ 
https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2713:
---
Fix Version/s: (was: 2.7.0)

 Broken RM Home link in NM Web UI when RM HA is enabled
 

 Key: YARN-2713
 URL: https://issues.apache.org/jira/browse/YARN-2713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2713-1.patch


 When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It 
 points to the NM-host:RM-port instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2713) Broken RM Home link in NM Web UI when RM HA is enabled


[ 
https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182195#comment-14182195
 ] 

Karthik Kambatla commented on YARN-2713:


[~xgong] - I believe you are the most familiar with HA-redirections. Will you 
be able to take a look at this patch? Thanks. 

 Broken RM Home link in NM Web UI when RM HA is enabled
 

 Key: YARN-2713
 URL: https://issues.apache.org/jira/browse/YARN-2713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2713-1.patch


 When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It 
 points to the NM-host:RM-port instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-10-23 Thread Carlo Curino (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182205#comment-14182205
]

Carlo Curino commented on YARN-2664:

Matteo, first of all, thanks for looking into this.

The delta-encoding you propose makes lots of sense and it is well aligned
with the internal representation of resource allocations (which are mostly
based on: *RLESparseResourceAllocation*), so you should be able to extract it
from there easily.

One thing we need to figure out is whether to use the javascript library I had
in the seed patch above or other javascript (or non-javascript) visualization
lib. Anything that can consume the json format you propose, and has an
amenable licensing for hadoop is ok with me. (if anyone else has suggestions
on this please chime in!)

Another important problem will be what to visualize. I suspect that showing
all jobs accepted over an arbitrary past/future time range is likely going to
be too much for any large cluster... Being able to focus on a portion of the
plan (e.g., time-range, user, queue) I think is going to be important. This
would allow the GUI to lazily fetch the data corresponding to the portion of
the plan we are visualizing, instead of dumping out the entire plan (which even
with your much better delta-encoding might eventually be too big).

My 2 cents..

Improve RM webapp to expose info about reservations.

Key: YARN-2664
URL: https://issues.apache.org/jira/browse/YARN-2664
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Carlo Curino
Attachments: PlannerPage_screenshot.pdf, YARN-2664.patch

YARN-1051 provides a new functionality in the RM to ask for reservation on
resources. Exposing this through the webapp GUI is important.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2713) Broken RM Home link in NM Web UI when RM HA is enabled


[ 
https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182242#comment-14182242
 ] 

Hadoop QA commented on YARN-2713:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676784/yarn-2713-1.patch
  against trunk revision 828429d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5532//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5532//console

This message is automatically generated.

 Broken RM Home link in NM Web UI when RM HA is enabled
 

 Key: YARN-2713
 URL: https://issues.apache.org/jira/browse/YARN-2713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2713-1.patch


 When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It 
 points to the NM-host:RM-port instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again


[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182245#comment-14182245
 ] 

Hadoop QA commented on YARN-2010:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676770/yarn-2010-5.patch
  against trunk revision 828429d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5531//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5531//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5531//console

This message is automatically generated.

 If RM fails to recover an app, it can never transition to active again
 --

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, 
 issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, 
 yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch


 Sometimes, the RM fails to recover an application. It could be because of 
 turning security on, token expiry, or issues connecting to HDFS etc. The 
 causes could be classified into (1) transient, (2) specific to one 
 application, and (3) permanent and apply to multiple (all) applications. 
 Today, the RM fails to transition to Active and ends up in STOPPED state and 
 can never be transitioned to Active again.
 The initial stacktrace reported is at 
 https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display

[
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182247#comment-14182247
]

Hadoop QA commented on YARN-2703:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12676733/YARN-2703.3.patch
against trunk revision 828429d.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The following test timeouts occurred in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDecommTests
org.apache.hadoop.hdfs.TestParallelUnixDomaTests
org.apache.hadoop.hdfs.TestEncryptionZonesWTests
org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecTests
org.apache.hadoop.hdfs.TestPTestsTests
org.apache.hadoop.hdfs.TestGetBTests
org.apache.hadoop.hdfs.TestFileCreTests
org.apache.hadoop.hdfs.TestWriTests
org.apache.hadoop.hdfs.TestSetrepIncrTests
org.apache.hadoop.hdfs.TestRenameWhiTests
org.apache.hadoop.hdfs.TestBlockReaderLocTesTests
org.apache.hadoop.hdfs.TestEncryptionZoneTeTests
org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContraTesTests
org.apache.hadoop.hdfs.web.TestWebHDFTeTests
org.apache.hadoop.hdfs.web.TestWebHDFSForHTeTests

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/5525//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5525//console

This message is automatically generated.

Add logUploadedTime into LogValue for better display

Key: YARN-2703
URL: https://issues.apache.org/jira/browse/YARN-2703
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch

Right now, the container can upload its logs multiple times. Sometimes,
containers write different logs into the same log file. After the log
aggregation, when we query those logs, it will show:
LogType: stderr
LogContext:
LogType: stdout
LogContext:
LogType: stderr
LogContext:
LogType: stdout
LogContext:
The same files could be displayed multiple times. But we can not figure out
which logs come first. We could add extra loguploadedTime to let users have
better understanding on the logs.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2010) If RM fails to recover an app, it can never transition to active again


 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---
Attachment: yarn-2010-6.patch

Updated patch to fix the findbugs issue, it was due to an empty if-block that 
got left around by mistake. 

 If RM fails to recover an app, it can never transition to active again
 --

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, 
 issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, 
 yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch


 Sometimes, the RM fails to recover an application. It could be because of 
 turning security on, token expiry, or issues connecting to HDFS etc. The 
 causes could be classified into (1) transient, (2) specific to one 
 application, and (3) permanent and apply to multiple (all) applications. 
 Today, the RM fails to transition to Active and ends up in STOPPED state and 
 can never be transitioned to Active again.
 The initial stacktrace reported is at 
 https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182257#comment-14182257
 ] 

Wangda Tan commented on YARN-2495:
--

[~Naganarasimha],
One comment before you uploading patch:
I suggest to have an option to indicate if currently is using decentralized 
node label configuration or not. If it is true, NM will do following steps like 
create NodeLabelProvider, setup labels in NodeHeadrbeatRequest, etc.  
If you think that is make sense to you, I suggest we can call it 
ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION - 
(yarn.node-labels.decentralized-configuration.enabled), or do you have another 
suggestions?
And also, that value will be used by RM, RM need do similar things like disable 
admin change labels on nodes via RM admin CLI, etc. I think you can first focus 
on NM stuffs and ResourceTracker changes in RM. AdminService related changes 
can be split to another JIRA.

Thanks,
Wangda

 Allow admin specify labels in each NM (Distributed configuration)
 -

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

[
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182291#comment-14182291
]

Jian He commented on YARN-2704:
---

Thanks Vinod for the review !
bq. removeApplicationFromRenewal() is only called when log-aggregation is
enabled, so that will affect the new credentials map?
removeApplicationFromRenewal is actually directly called when log-aggregation
is disabled. if it’s enabled, they are added to a delayed map.
bq. Just bubble up the IOException instead of wrapping it in
YarnRuntimeException.
It’s inside the run method, so I wrap it with runTimeException
bq. We don't need to renew the token immediately after obtaining it?
This is to get the expiration date, token itself doesn’t have the expiration
date
bq. Make 3600 a constant. And why is it 10 hours? Shouldn't this be a
function of the max-life-time in general?
This guarantees we have 10h minimum buffer to distribute the tokens. Any time
more that is not necessary ?
bq. we should also look for the service name matching the default-file-system.
there’s no easy way to get the service name base on the file-system object, and
the hdfs token service-name varies in different case: e.g. ha/non-ha;
use-ip/use-hostname
bq. The log message found existing hdfs token needs to be a debug log
Regarding the info/debug level logs, these logs are all low frequency logs, by
default it’s only 1 day a time(renew interval) . And it’s so much easier to
debug in info level than debug level. maybe in info level while stablizing this
feature?

Localization and log-aggregation will fail if hdfs delegation token expired
after token-max-life-time
--

Key: YARN-2704
URL: https://issues.apache.org/jira/browse/YARN-2704
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-2704.1.patch

In secure mode, YARN requires the hdfs-delegation token to do localization
and log aggregation on behalf of the user. But the hdfs delegation token will
eventually expire after max-token-life-time. So, localization and log
aggregation will fail after the token expires.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again