[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2015-01-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287446#comment-14287446
 ] 

Karthik Kambatla commented on YARN-2194:


container-executor.c - the new method significantly duplicates the existing 
one. Can we have separate methods to capture the differences and leave the 
original method as is. 

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2194-1.patch


In previous versions of RedHat, we can build custom cgroup hierarchies 
 with use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287278#comment-14287278
 ] 

Hadoop QA commented on YARN-1743:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12693846/YARN-1743-2.patch
  against trunk revision 786dbdf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

  {color:red}-1 javac{color}.  The applied patch generated 1205 javac 
compiler warnings (more than the trunk's current 1204 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 3 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6389//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6389//artifact/patchprocess/patchReleaseAuditProblems.txt
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6389//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6389//console

This message is automatically generated.

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
  Labels: documentation
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, 
 YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2973) Capacity scheduler configuration ACLs not work.

2015-01-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287301#comment-14287301
 ] 

Rohith commented on YARN-2973:
--

Going through queueACLs more deeper I believe as per design , it is expected to 
disable ACLs for root queue. ACLs disables can be done configuring  (space) 
for the queue configuration.

 Capacity scheduler configuration ACLs not work.
 ---

 Key: YARN-2973
 URL: https://issues.apache.org/jira/browse/YARN-2973
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
 Environment: ubuntu 12.04, cloudera manager, cdh5.2.1
Reporter: Jimmy Song
Assignee: Rohith
  Labels: acl, capacity-scheduler, yarn

 I follow this page to configure yarn: 
 http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
  
 I configured YARN to use capacity scheduler in yarn-site.xml with 
 yarn.resourcemanager.scheduler.class for 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
  Then modified capacity-scheduler.xml,
 ___
 ?xml version=1.0?
 configuration
   property
 nameyarn.scheduler.capacity.root.queues/name
 valuedefault,extract,report,tool/value
   /property
   property
 nameyarn.scheduler.capacity.root.state/name
 valueRUNNING/value
   /property
   property
 nameyarn.scheduler.capacity.root.default.acl_submit_applications/name
 valuejcsong2, y2 /value
   /property
   property
 nameyarn.scheduler.capacity.root.default.acl_administer_queue/name
 valuejcsong2, y2 /value
   /property
   property
 nameyarn.scheduler.capacity.root.default.capacity/name
 value35/value
   /property
   property
 nameyarn.scheduler.capacity.root.extract.acl_submit_applications/name
 valuejcsong2 /value
   /property
   property
 nameyarn.scheduler.capacity.root.extract.acl_administer_queue/name
 valuejcsong2 /value
   /property
   property
 nameyarn.scheduler.capacity.root.extract.capacity/name
 value15/value
   /property
   property
 nameyarn.scheduler.capacity.root.report.acl_submit_applications/name
 valuey2 /value
   /property
   property
 nameyarn.scheduler.capacity.root.report.acl_administer_queue/name
 valuey2 /value
   /property
   property
 nameyarn.scheduler.capacity.root.report.capacity/name
 value35/value
   /property
   property
 nameyarn.scheduler.capacity.root.tool.acl_submit_applications/name
 value /value
   /property
   property
 nameyarn.scheduler.capacity.root.tool.acl_administer_queue/name
 value /value
   /property
   property
 nameyarn.scheduler.capacity.root.tool.capacity/name
 value15/value
   /property
 /configuration
 ___
 I have enabled the acl in yarn-site.xml, but the user jcsong2 can submit 
 applications to every queue. The queue acl does't work! And the queue used 
 capacity more than it was configured! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-01-22 Thread Michael Br (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287336#comment-14287336
 ] 

Michael Br commented on YARN-3084:
--

Hi,
first thanks for your quick reply...

1. Where can i see the queue? i looked for it but i still dont get how to get 
to it
2. I cant reach the logs...

The only clue I have regarding this job fails is:

Thats the only information i can see from the URL: 
http://192.168.38.133:8088/ws/v1/cluster/apps/application_1421661392788_0039
(the links there when i try to use them for the logs dont 
workhttp://sandbox.hortonworks.com:8088/proxy/application_1421661392788_0039/
 [i replace and insert my host ip]
and also the link for the container logs dont work. 
http://sandbox.hortonworks.com:8042/node/containerlogs/container_1421661392788_0039_02_01/dr.who
--
app
idapplication_1421661392788_0039/id
userdr.who/user
nametest_33/name
queuedefault/queue
stateFAILED/state
finalStatusFAILED/finalStatus
progress0.0/progress
trackingUIHistory/trackingUI
trackingUrl
http://sandbox.hortonworks.com:8088/cluster/app/application_1421661392788_0039
/trackingUrl
diagnostics
Application application_1421661392788_0039 failed 2 times due to AM Container 
for appattempt_1421661392788_0039_02 exited with exitCode: 0 For more 
detailed output, check application tracking 
page:http://sandbox.hortonworks.com:8088/proxy/application_1421661392788_0039/Then,
 click on links to logs of each attempt. Diagnostics: Failing this attempt. 
Failing the application.
/diagnostics
clusterId1421661392788/clusterId
applicationTypeMAPREDUCE/applicationType
applicationTagsmichael,pi example/applicationTags
startedTime1421923561425/startedTime
finishedTime1421923723426/finishedTime
elapsedTime162001/elapsedTime
amContainerLogs
http://sandbox.hortonworks.com:8042/node/containerlogs/container_1421661392788_0039_02_01/dr.who
/amContainerLogs
amHostHttpAddresssandbox.hortonworks.com:8042/amHostHttpAddress
allocatedMB-1/allocatedMB
allocatedVCores-1/allocatedVCores
runningContainers-1/runningContainers
memorySeconds200857/memorySeconds
vcoreSeconds160/vcoreSeconds
preemptedResourceMB0/preemptedResourceMB
preemptedResourceVCores0/preemptedResourceVCores
numNonAMContainerPreempted0/numNonAMContainerPreempted
numAMContainerPreempted0/numAMContainerPreempted
/app

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor

 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 

[jira] [Commented] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts

2015-01-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287320#comment-14287320
 ] 

Rohith commented on YARN-2684:
--

[~kasha] Kindly review the patch

 FairScheduler should tolerate queue configuration changes across RM restarts
 

 Key: YARN-2684
 URL: https://issues.apache.org/jira/browse/YARN-2684
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2684.patch


 YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'

2015-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287491#comment-14287491
 ] 

Hudson commented on YARN-3078:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #78 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/78/])
YARN-3078. LogCLIHelpers lacks of a blank space before string 'does not exist'. 
Contributed by Sam Liu. (ozawa: rev 5712c9f96a2cf4ff63d36906ab3876444c0cddec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt


 LogCLIHelpers lacks of a blank space before string 'does not exist'
 ---

 Key: YARN-3078
 URL: https://issues.apache.org/jira/browse/YARN-3078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: sam liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3078.001.patch, YARN-3078.002.patch


 LogCLIHelpers lacks of a blank space before string 'does not exist' and it 
 will bring incorrect return message.
 For example, I ran command 'yarn logs -applicationId 
 application_1421742816585_0003', and the return message includes 
 'logs/application_1421742816585_0003does not exist'. 
 Obviously it's incorrect and the correct return message should be 
 'logs/application_1421742816585_0003 does not exist'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reopened YARN-3083:
--

This was fixed by YARN-1975.  Reopening to resolve this as a duplicate of that.

 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:lt;memory:65536, vCores:0 gt;
 so obviously  lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-3083.
--
Resolution: Duplicate

 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:lt;memory:65536, vCores:0 gt;
 so obviously  lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'

2015-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287473#comment-14287473
 ] 

Hudson commented on YARN-3078:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2013 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2013/])
YARN-3078. LogCLIHelpers lacks of a blank space before string 'does not exist'. 
Contributed by Sam Liu. (ozawa: rev 5712c9f96a2cf4ff63d36906ab3876444c0cddec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt


 LogCLIHelpers lacks of a blank space before string 'does not exist'
 ---

 Key: YARN-3078
 URL: https://issues.apache.org/jira/browse/YARN-3078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: sam liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3078.001.patch, YARN-3078.002.patch


 LogCLIHelpers lacks of a blank space before string 'does not exist' and it 
 will bring incorrect return message.
 For example, I ran command 'yarn logs -applicationId 
 application_1421742816585_0003', and the return message includes 
 'logs/application_1421742816585_0003does not exist'. 
 Obviously it's incorrect and the correct return message should be 
 'logs/application_1421742816585_0003 does not exist'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287095#comment-14287095
 ] 

Tsuyoshi OZAWA commented on YARN-3082:
--

Looks good to me overall. Minor nits:

{code}
+}
+finally {
+  threadPool.shutdownNow();
{code}

This finally statement should be moved after a brace in the line like this:

{code}
} finally {
  threadPool.shutdownNow();
{code}


 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287117#comment-14287117
 ] 

Hadoop QA commented on YARN-3082:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12693790/YARN-3082.001.patch
  against trunk revision 786dbdf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6385//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6385//console

This message is automatically generated.

 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour

2015-01-22 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-1743:
-
Attachment: YARN-1743-2.patch

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
  Labels: documentation
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, 
 YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour

2015-01-22 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287192#comment-14287192
 ] 

Jeff Zhang commented on YARN-1743:
--

[~leftnoteasy] Upload a new patch 
* Change the annotation type to be Class
* Add more java doc to explain the usage of the 2 annotations
* The patch only use the annotation on ApplicationEventType, for the other 
events we can create following up jira on that.

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
  Labels: documentation
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, 
 YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287199#comment-14287199
 ] 

Sunil G commented on YARN-2896:
---

Thank you [~eepayne] and [~leftnoteasy] for sharing your thoughts.

It is easier in implementation with integers and directly it can be configured 
and operated with. On the other hand Labels makes more readable and easier for 
a user to submit an app and visualize same in UI. Same for admins. But as you 
mentioned it comes with slighter complexity at RM to be an interface to 
scheduler. 
This thought of labels was one of initial idea where in design was formulated. 
As [~vinodkv] also participated in design, its good to hear his thoughts too.

Also I surely feel that we still need PriorityLabelManager, it can help us in 
storing the data from various options such as config files or admin commands or 
even REST. Its better to take such complexities out of RMAppManager. Also for 
HA cases, PriorityLabelManager itself can help to bring back config loaded from 
memory/files. Also scheduler need not have to take any load, and 
PrioirtyLabelManager can help in providing priority for app and its acls.


 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xia Hu updated YARN-3083:
-
Labels: UI fair  (was: patch)

 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fair

 In my fair scheduler web page, the resources shown of each queue is like 
 this:   lt;memory:65536, vCores:0gt;
 so obviously lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xia Hu updated YARN-3083:
-
Labels: UI fairscheduler  (was: UI fair)

 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:   lt;memory:65536, vCores:0gt;
 so obviously lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xia Hu updated YARN-3083:
-
Description: 
In my fair scheduler web page, the resources shown of each queue is like this:  
 lt;memory:65536, vCores:0gt;
so obviously  lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 

  was:
In my fair scheduler web page, the resources shown of each queue is like this:  
 lt;memory:65536, vCores:0gt;
so obviously lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 


 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:   lt;memory:65536, vCores:0gt;
 so obviously  lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xia Hu updated YARN-3083:
-
Description: 
In my fair scheduler web page, the resources shown of each queue is like this:  
 lt;memory:65536, vCores:0gt;
so obviously lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 

  was:
In my fair scheduler web page, the resources shown of each queue is like this:  
 lt;memory:65536, vCores:0gt;
so obviously lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 


 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:   lt;memory:65536, vCores:0gt;
 so obviously lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287174#comment-14287174
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

I mention following line in CommonNodeLabelsManager:
{code}
  protected boolean nodeLabelsEnabled = false;
{code}

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287173#comment-14287173
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

[~leftnoteasy], the patch looks good to me overall, but it is outdated. Could 
you rebase the patch to include the changes against 
TestCapacitySchedulerNodeLabelUpdate?

Additionally, I think nodeLabelsEnabled should be defined as volatile variable 
since it's accessed from multiple threads. Could you update it? 


 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xia Hu updated YARN-3083:
-
Description: 
In my fair scheduler web page, the resources shown of each queue is like this:  
  lt;memory:65536, vCores:0 gt;
so obviously  lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 

  was:
In my fair scheduler web page, the resources shown of each queue is like this:  
 lt;memory:65536, vCores:0gt;
so obviously  lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 


 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:lt;memory:65536, vCores:0 gt;
 so obviously  lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2800:
-
Attachment: YARN-2800-20141205-1.patch

Reattaching the patch by Wangda to kick Jenkins.

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)
Xia Hu created YARN-3083:


 Summary: Resource format isn't correct in Fair Scheduler web page
 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu


In my fair scheduler web page, the resources shown of each queue is like this:  
 lt;memory:65536, vCores:0gt;
so obviously lt; should be shown as , bet it isn't. After reading the 
codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
twice. But I only found one place. Anyway, I modify the code, and this problem 
seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Xia Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xia Hu updated YARN-3083:
-
Attachment: fairscheduler-ui-format.patch

 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fair
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:   lt;memory:65536, vCores:0gt;
 so obviously lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287176#comment-14287176
 ] 

Hadoop QA commented on YARN-2800:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12693838/YARN-2800-20141205-1.patch
  against trunk revision 786dbdf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6386//console

This message is automatically generated.

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3083) Resource format isn't correct in Fair Scheduler web page

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287181#comment-14287181
 ] 

Hadoop QA commented on YARN-3083:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12693842/fairscheduler-ui-format.patch
  against trunk revision 786dbdf.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6387//console

This message is automatically generated.

 Resource format isn't correct in Fair Scheduler web page
 

 Key: YARN-3083
 URL: https://issues.apache.org/jira/browse/YARN-3083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
Reporter: Xia Hu
  Labels: UI, fairscheduler
 Attachments: fairscheduler-ui-format.patch


 In my fair scheduler web page, the resources shown of each queue is like 
 this:lt;memory:65536, vCores:0 gt;
 so obviously  lt; should be shown as , bet it isn't. After reading the 
 codes, I suppose it's because method StringEscapeUtils.escapeHtml is called 
 twice. But I only found one place. Anyway, I modify the code, and this 
 problem seems to be solved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-01-22 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-3024:

Attachment: YARN-3024.04.patch

Modified the test to verify the changed logic.

 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
 YARN-3024.03.patch, YARN-3024.04.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-01-22 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-11504 to YARN-3084:
---

 Tags:   (was: HADOOP MAPREDUCE YARN REST API SUBMIT)
  Component/s: (was: documentation)
   documentation
 Target Version/s:   (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   2.6.0
  Key: YARN-3084  (was: HADOOP-11504)
  Project: Hadoop YARN  (was: Hadoop Common)

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor

 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api].
 6.What am I missing? (the description is not clear to me in the submit 
 section of the rest api 2.6)
 7.Does someone have an xml example for using a simple MR job?
 8.Thanks! Here is the XML file I am using for the request body:
 --
 --
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 application-submission-context
   application-idapplication_1421661392788_0038/application-id
 application-nametest_21_1/application-name
   queuedefault/queue
 priority3/priority
 am-container-spec  
   environment   
   entry   
   keyCLASSPATH/key
   
 value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value
   /entry
   /environment
   commands
   commandhadoop jar 
 

[jira] [Updated] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-01-22 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3084:
-
Component/s: (was: documentation)
 webapp
 resourcemanager

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor

 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api].
 6.What am I missing? (the description is not clear to me in the submit 
 section of the rest api 2.6)
 7.Does someone have an xml example for using a simple MR job?
 8.Thanks! Here is the XML file I am using for the request body:
 --
 --
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 application-submission-context
   application-idapplication_1421661392788_0038/application-id
 application-nametest_21_1/application-name
   queuedefault/queue
 priority3/priority
 am-container-spec  
   environment   
   entry   
   keyCLASSPATH/key
   
 value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value
   /entry
   /environment
   commands
   commandhadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10/command
   /commands
 /am-container-spec
 unmanaged-AMfalse/unmanaged-AM
 max-app-attempts2/max-app-attempts
 resource  
   memory1024/memory
   vCores1/vCores
 /resource
   

[jira] [Commented] (YARN-3084) YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes to run

2015-01-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287214#comment-14287214
 ] 

Steve Loughran commented on YARN-3084:
--

202, accepted, looks like the RM accepted it.

# does it appear in the queue of job submissions??
# what does the RM log say?

 YARN REST API 2.6 - can't submit simple job in hortonworks-allways job failes 
 to run
 

 Key: YARN-3084
 URL: https://issues.apache.org/jira/browse/YARN-3084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Using eclipse on windows 7 (client)to run the map reduce 
 job on the host of Hortonworks HDP 2.2 (hortonworks is on vmware version 
 6.0.2 build-1744117)
Reporter: Michael Br
Priority: Minor

 Hello,
 1.I want to run the simple Map Reduce job example (with the REST API 2.6 
 for yarn applications) and to calculate PI… for now it doesn’t work.
 When I use the command in the hortonworks terminal it works: “hadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10”.
 But I want to submit the job with the REST API and not in the terminal as a 
 command line. 
 [http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application]
 2.I do succeed with other REST API requests: get state, get new 
 application id and even kill(change state), but when I try to submit my 
 example, the response is:
 --
 --
 The Response Header:
 Key : null ,Value : [HTTP/1.1 202 Accepted]
 Key : Date ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 07:47:24 
 GMT]
 Key : Content-Length ,Value : [0]
 Key : Expires ,Value : [Thu, 22 Jan 2015 07:47:24 GMT, Thu, 22 Jan 2015 
 07:47:24 GMT]
 Key : Location ,Value : [http://[my 
 port]:8088/ws/v1/cluster/apps/application_1421661392788_0038]
 Key : Content-Type ,Value : [application/json]
 Key : Server ,Value : [Jetty(6.1.26.hwx)]
 Key : Pragma ,Value : [no-cache, no-cache]
 Key : Cache-Control ,Value : [no-cache]
 The Respone Body:
 Null (No Response)
 --
 --
 3.I need help with the http request body filling. I am doing a POST http 
 request and I know that I am doing it right (in java).
 4.I think the problem is in the request body.
 5.I used this guy’s answer to help me build my map reduce example xml but 
 it does not work: 
 [http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api].
 6.What am I missing? (the description is not clear to me in the submit 
 section of the rest api 2.6)
 7.Does someone have an xml example for using a simple MR job?
 8.Thanks! Here is the XML file I am using for the request body:
 --
 --
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 application-submission-context
   application-idapplication_1421661392788_0038/application-id
 application-nametest_21_1/application-name
   queuedefault/queue
 priority3/priority
 am-container-spec  
   environment   
   entry   
   keyCLASSPATH/key
   
 value/usr/hdp/2.2.0.0-2041/hadoop/conflt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*lt;CPSgt;lt;CPSgt;/usr/share/java/mysql-connector-java-5.1.17.jarlt;CPSgt;/usr/share/java/mysql-connector-java.jarlt;CPSgt;/usr/hdp/current/hadoop-mapreduce-client/*lt;CPSgt;/usr/hdp/current/tez-client/*lt;CPSgt;/usr/hdp/current/tez-client/lib/*lt;CPSgt;/etc/tez/conf/lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/*lt;CPSgt;/usr/hdp/2.2.0.0-2041/tez/lib/*lt;CPSgt;/etc/tez/conf/value
   /entry
   /environment
   commands
   commandhadoop jar 
 /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0.2.2.0.0-2041.jar
  pi 10 10/command
   /commands
 /am-container-spec
 unmanaged-AMfalse/unmanaged-AM
 max-app-attempts2/max-app-attempts
 resource  
 

[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287220#comment-14287220
 ] 

Hadoop QA commented on YARN-3024:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12693845/YARN-3024.04.patch
  against trunk revision 786dbdf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6388//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6388//console

This message is automatically generated.

 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
 YARN-3024.03.patch, YARN-3024.04.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'

2015-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287237#comment-14287237
 ] 

Hudson commented on YARN-3078:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #81 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/81/])
YARN-3078. LogCLIHelpers lacks of a blank space before string 'does not exist'. 
Contributed by Sam Liu. (ozawa: rev 5712c9f96a2cf4ff63d36906ab3876444c0cddec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 LogCLIHelpers lacks of a blank space before string 'does not exist'
 ---

 Key: YARN-3078
 URL: https://issues.apache.org/jira/browse/YARN-3078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: sam liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3078.001.patch, YARN-3078.002.patch


 LogCLIHelpers lacks of a blank space before string 'does not exist' and it 
 will bring incorrect return message.
 For example, I ran command 'yarn logs -applicationId 
 application_1421742816585_0003', and the return message includes 
 'logs/application_1421742816585_0003does not exist'. 
 Obviously it's incorrect and the correct return message should be 
 'logs/application_1421742816585_0003 does not exist'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'

2015-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287254#comment-14287254
 ] 

Hudson commented on YARN-3078:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #815 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/815/])
YARN-3078. LogCLIHelpers lacks of a blank space before string 'does not exist'. 
Contributed by Sam Liu. (ozawa: rev 5712c9f96a2cf4ff63d36906ab3876444c0cddec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 LogCLIHelpers lacks of a blank space before string 'does not exist'
 ---

 Key: YARN-3078
 URL: https://issues.apache.org/jira/browse/YARN-3078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: sam liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3078.001.patch, YARN-3078.002.patch


 LogCLIHelpers lacks of a blank space before string 'does not exist' and it 
 will bring incorrect return message.
 For example, I ran command 'yarn logs -applicationId 
 application_1421742816585_0003', and the return message includes 
 'logs/application_1421742816585_0003does not exist'. 
 Obviously it's incorrect and the correct return message should be 
 'logs/application_1421742816585_0003 does not exist'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2015-01-22 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287931#comment-14287931
 ] 

Chen He commented on YARN-2466:
---

This is a good point. For isolation concern, I think we should not need to 
inform RM about this since it is just one type of ContainerExecutor, RM should 
look it as a general container as others (default, lxc, etc). But as 
[~eronwright] mentioned, we should find a way to avoid being killed because of 
timeout.

 Umbrella issue for Yarn launched Docker Containers
 --

 Key: YARN-2466
 URL: https://issues.apache.org/jira/browse/YARN-2466
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Abin Shahab
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to package their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).
 In addition to software isolation mentioned above, Docker containers will 
 provide resource, network, and user-namespace isolation. 
 Docker provides resource isolation through cgroups, similar to 
 LinuxContainerExecutor. This prevents one job from taking other jobs 
 resource(memory and CPU) on the same hadoop cluster. 
 User-namespace isolation will ensure that the root on the container is mapped 
 an unprivileged user on the host. This is currently being added to Docker.
 Network isolation will ensure that one user’s network traffic is completely 
 isolated from another user’s network traffic. 
 Last but not the least, the interaction of Docker and Kerberos will have to 
 be worked out. These Docker containers must work in a secure hadoop 
 environment.
 Additional details are here: 
 https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2015-01-22 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287941#comment-14287941
 ] 

Chen He commented on YARN-2466:
---

Hi [~ashahab], if you don't mind, I will create a sub-task JIRA to trace this 
issue. 

 Umbrella issue for Yarn launched Docker Containers
 --

 Key: YARN-2466
 URL: https://issues.apache.org/jira/browse/YARN-2466
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Abin Shahab
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to package their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).
 In addition to software isolation mentioned above, Docker containers will 
 provide resource, network, and user-namespace isolation. 
 Docker provides resource isolation through cgroups, similar to 
 LinuxContainerExecutor. This prevents one job from taking other jobs 
 resource(memory and CPU) on the same hadoop cluster. 
 User-namespace isolation will ensure that the root on the container is mapped 
 an unprivileged user on the host. This is currently being added to Docker.
 Network isolation will ensure that one user’s network traffic is completely 
 isolated from another user’s network traffic. 
 Last but not the least, the interaction of Docker and Kerberos will have to 
 be worked out. These Docker containers must work in a secure hadoop 
 environment.
 Additional details are here: 
 https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-01-22 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3087:
-

 Summary: the REST server (web server) for per-node aggregator does 
not work if it runs inside node manager
 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee


This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator 
and the associated REST server. It runs fine as a standalone process, but does 
not work if it runs inside the node manager due to possible collisions of 
servlet mapping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2015-01-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287645#comment-14287645
 ] 

Wei Yan commented on YARN-2194:
---

Sure, will update a new patch by combing [~bcwalrus] comments.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2194-1.patch


In previous versions of RedHat, we can build custom cgroup hierarchies 
 with use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2015-01-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287644#comment-14287644
 ] 

Wei Yan commented on YARN-2194:
---

Sure, will update a new patch by combing [~bcwalrus] comments.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2194-1.patch


In previous versions of RedHat, we can build custom cgroup hierarchies 
 with use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'

2015-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287601#comment-14287601
 ] 

Hudson commented on YARN-3078:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2032 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2032/])
YARN-3078. LogCLIHelpers lacks of a blank space before string 'does not exist'. 
Contributed by Sam Liu. (ozawa: rev 5712c9f96a2cf4ff63d36906ab3876444c0cddec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt


 LogCLIHelpers lacks of a blank space before string 'does not exist'
 ---

 Key: YARN-3078
 URL: https://issues.apache.org/jira/browse/YARN-3078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: sam liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3078.001.patch, YARN-3078.002.patch


 LogCLIHelpers lacks of a blank space before string 'does not exist' and it 
 will bring incorrect return message.
 For example, I ran command 'yarn logs -applicationId 
 application_1421742816585_0003', and the return message includes 
 'logs/application_1421742816585_0003does not exist'. 
 Obviously it's incorrect and the correct return message should be 
 'logs/application_1421742816585_0003 does not exist'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-01-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3087:
--
Description: 
This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator 
and the associated REST server. It runs fine as a standalone process, but does 
not work if it runs inside the node manager due to possible collisions of 
servlet mapping.

Exception:
{noformat}
org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 
not found
at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
...
{noformat}

  was:This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
aggregator and the associated REST server. It runs fine as a standalone 
process, but does not work if it runs inside the node manager due to possible 
collisions of servlet mapping.


 the REST server (web server) for per-node aggregator does not work if it runs 
 inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee

 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288324#comment-14288324
 ] 

Wangda Tan commented on YARN-3079:
--

Hi [~zxu],
Thanks for taking up this task, just reviewed your patch, some comments
1) Suggest to change signature of updateMaximumAllocation(SchedulerNode, bool) 
to updateMaximumAllocation(Resource nodeResource, bool), since we only uses 
nodeResource here.
2) Change resource for a NM is equivalent to 
{{updateMaximumAllocation(oldNodeResource, false)}} and 
{{updateMaximumAllocation(newNoderesource, true)}}. We can avoid some 
duplicated logic.
3) Suggest rename updateMaximumAllocation(void) to refreshMaximumAllocation() 
or other name reflects the behavior: scan all cluster nodes and get maximum 
allocation.
4) Not related to this fix -- I found only max allocation is protected by R/W 
lock, it seems not correct to me, I think we should address it in a separated 
JIRA. Will file a ticket later.

Wangda

 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3028) Better syntax for replace label CLI

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288273#comment-14288273
 ] 

Wangda Tan commented on YARN-3028:
--

[~rohithsharma],
I see, just re-reviewed, my bad. Yes, 1#/2# are all addressed. 

A nit for test:
I suggest to merge {{testReplaceLabelsOnNodeWithPort}} to 
{{testReplaceLabelsOnNode}}. It's no need to split them.
And a case for = but without port should added.

Nit for help message:
1) port should be optional, you can make a small change here for help message:
\[node1:port=label1,label2 node2:port=label1,label2\] should be 
\[node1\[:port\]=label1...\]

2) {{printHelp}} should be updated as well.

Also, I still suggest add a small comment before
{code}
  String[] splits = nodeToLabels.split(=);
{code}
To explicitly indicate we support , for compatibility.

Thanks,
Wangda




 Better syntax for replace label CLI
 ---

 Key: YARN-3028
 URL: https://issues.apache.org/jira/browse/YARN-3028
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3028.patch


 The command to replace label now is such:
 {code}
 yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]
 {code}
 Instead of {code} node1:port,label1,label2 {code} I think it's better to say 
 {code} node1:port=label1,label2 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3078) LogCLIHelpers lacks of a blank space before string 'does not exist'

2015-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287568#comment-14287568
 ] 

Hudson commented on YARN-3078:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #82 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/82/])
YARN-3078. LogCLIHelpers lacks of a blank space before string 'does not exist'. 
Contributed by Sam Liu. (ozawa: rev 5712c9f96a2cf4ff63d36906ab3876444c0cddec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* hadoop-yarn-project/CHANGES.txt


 LogCLIHelpers lacks of a blank space before string 'does not exist'
 ---

 Key: YARN-3078
 URL: https://issues.apache.org/jira/browse/YARN-3078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: sam liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3078.001.patch, YARN-3078.002.patch


 LogCLIHelpers lacks of a blank space before string 'does not exist' and it 
 will bring incorrect return message.
 For example, I ran command 'yarn logs -applicationId 
 application_1421742816585_0003', and the return message includes 
 'logs/application_1421742816585_0003does not exist'. 
 Obviously it's incorrect and the correct return message should be 
 'logs/application_1421742816585_0003 does not exist'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3085) Application summary should include the application type

2015-01-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3085:


 Summary: Application summary should include the application type
 Key: YARN-3085
 URL: https://issues.apache.org/jira/browse/YARN-3085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Jason Lowe


Adding the application type to the RM application summary log makes it easier 
to audit the number of applications from various app frameworks that are 
running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3085) Application summary should include the application type

2015-01-22 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3085:


Assignee: Rohith

 Application summary should include the application type
 ---

 Key: YARN-3085
 URL: https://issues.apache.org/jira/browse/YARN-3085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Rohith

 Adding the application type to the RM application summary log makes it easier 
 to audit the number of applications from various app frameworks that are 
 running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3091:
-
Target Version/s: 2.7.0

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288446#comment-14288446
 ] 

Wangda Tan commented on YARN-3091:
--

I think may it can be a part of separated 
fine-grained-lock-enhancement-for-FairScheduler (If there's other similar 
fine-grained changes needed)? To keep every patches to be easier reviewed, the 
{{AbstractYarnScheduler - CapacityScheduler - FairScheduler}} could address 
only general synchronized lock - r/w lock.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

2015-01-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1393:
---
Attachment: YARN-1393-1.patch

Thanks for the docs, Wei. Made minor changes - removed potentially redundant 
information.

If you think the patch is good, I ll go ahead and commit it. Thanks. 

 Add how-to-use instruction in README for Yarn Scheduler Load Simulator
 --

 Key: YARN-1393
 URL: https://issues.apache.org/jira/browse/YARN-1393
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-1393-1.patch, YARN-1393.patch


 The instructions are put in the .pdf document and site page. The README needs 
 to include a simple instruction for users to quickly pick up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1393) Add how-to-use instruction in README for Yarn Scheduler Load Simulator

2015-01-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1393:
--
Attachment: YARN-1393-2.patch

Thanks, [~kasha], I updated a new patch by clearing the file path.

 Add how-to-use instruction in README for Yarn Scheduler Load Simulator
 --

 Key: YARN-1393
 URL: https://issues.apache.org/jira/browse/YARN-1393
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-1393-1.patch, YARN-1393-2.patch, YARN-1393.patch


 The instructions are put in the .pdf document and site page. The README needs 
 to include a simple instruction for users to quickly pick up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2015-01-22 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288608#comment-14288608
 ] 

Leitao Guo commented on YARN-2466:
--

Currently, if I want to use DCE in my cluster, all the application should be 
running in DCE, that is not practical in our cluster.  Can 
yarn.nodemanager.container-executor.class support configurable per 
application? So that, we can use DCE in some applications, others can still use 
LCE.

 Umbrella issue for Yarn launched Docker Containers
 --

 Key: YARN-2466
 URL: https://issues.apache.org/jira/browse/YARN-2466
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Abin Shahab
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to package their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).
 In addition to software isolation mentioned above, Docker containers will 
 provide resource, network, and user-namespace isolation. 
 Docker provides resource isolation through cgroups, similar to 
 LinuxContainerExecutor. This prevents one job from taking other jobs 
 resource(memory and CPU) on the same hadoop cluster. 
 User-namespace isolation will ensure that the root on the container is mapped 
 an unprivileged user on the host. This is currently being added to Docker.
 Network isolation will ensure that one user’s network traffic is completely 
 isolated from another user’s network traffic. 
 Last but not the least, the interaction of Docker and Kerberos will have to 
 be worked out. These Docker containers must work in a secure hadoop 
 environment.
 Additional details are here: 
 https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2015-01-22 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288615#comment-14288615
 ] 

Beckham007 commented on YARN-2466:
--

We use YARN-2718 for this.

 Umbrella issue for Yarn launched Docker Containers
 --

 Key: YARN-2466
 URL: https://issues.apache.org/jira/browse/YARN-2466
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Abin Shahab
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to package their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).
 In addition to software isolation mentioned above, Docker containers will 
 provide resource, network, and user-namespace isolation. 
 Docker provides resource isolation through cgroups, similar to 
 LinuxContainerExecutor. This prevents one job from taking other jobs 
 resource(memory and CPU) on the same hadoop cluster. 
 User-namespace isolation will ensure that the root on the container is mapped 
 an unprivileged user on the host. This is currently being added to Docker.
 Network isolation will ensure that one user’s network traffic is completely 
 isolated from another user’s network traffic. 
 Last but not the least, the interaction of Docker and Kerberos will have to 
 be worked out. These Docker containers must work in a secure hadoop 
 environment.
 Additional details are here: 
 https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288633#comment-14288633
 ] 

Li Lu commented on YARN-3091:
-

Maybe we want to tweak the wording/organization of this JIRA a little bit? In 
the description of this JIRA, two major points are raised:

bq. Many unnecessary synchronized locks, we have seen several cases recently 
that too frequent access of scheduler makes scheduler hang. Which could be 
addressed by using read/write lock. Components include scheduler, CS queues, 
apps
I agree that readers-writer lock is a viable approach for many synchronization 
performance issues, but other synchronization mechanisms (such as concurrent 
data structures) may also be our options. 

bq. Some fields not properly locked (Like clusterResource)
Improperly synchronized accesses may cause data races, and are generally 
considered as bugs in Java programs (even though the Java memory model provides 
some sort of guarantee on racy programs). To me, it would be better if the 
second point could be categorized as bug fixes, rather than improvements, for 
the RM scheduler code. 

Therefore, maybe we want to solve the problem by two steps: a) fixing 
improperly synchronized data accesses in RM scheduler (correctness) and b) 
improve synchronization performance for RM scheduler code (performance)? I'm 
not sure if there should be two separate JIRAs to trace this, or we can combine 
both in one giant JIRA. 

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2015-01-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288644#comment-14288644
 ] 

Junping Du commented on YARN-914:
-

Sorry for replying late. These are all good points, a couple of comments:

bq. Sounds like we need a new state for NM, called decommission_in_progress 
when NM is draining the containers.
Agree. We need a dedicated state for NM in this situation and both AM and RM 
should be aware of it for properly handle it.  

bq. To clarify my early comment all its map output are fetched or until all 
the applications the node touches have completed, the question is when YARN 
can declare a node's state has been gracefully drained and thus the node 
gracefully decommissioned ( admins can shutdown the whole machine without any 
impact on jobs ). For MR, the state could be running tasks/containers or mapper 
outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes 
to finish the mappers on the node, another 5 minutes for the job to finish, 
then YARN can declare the node gracefully decommissioned in 8 minutes, instead 
of waiting for 30 minutes. RM knows all applications on any given NM. So if all 
applications on any given node have completed, RM can mark the node 
decommissioned.
The first step I was thinking to keep NM running in a low resource mode after 
graceful decommissioned - no running containers, no new containers get spawned, 
no obviously resources consumption, etc. and just like putting these nodes into 
maintenance mode. Timeout value there is used to kill unfinished containers to 
release resources. Not quite sure if we have to terminate NM after timeout but 
would like to understand your use case here.

bq. Yes, I meant long running services. If YARN just kills the containers upon 
decommission request, the impact could vary. Some services might not have 
states to drain. Or maybe the services can handle the state migration on their 
own without YARN's help. For such services, maybe we can just use 
ResourceOption's timeout for that; set timeout to 0 and NM will just kill the 
containers.
I believe most of these services already take care of losing nodes as each node 
in YARN cluster cannot be reliable always. However, I am not sure if they can 
handle state migration to new node ahead of predictable node lost here, or be 
stateless more or less make more sense here? If we have an example application 
that could easy migrate a node's state to another, then we can discuss how to 
provide some rudimentary support here.   

bq. Given we don't plan to have applications checkpoint and migrate states, it 
doesn't seem to be necessary to have YARN notify applications upon decommission 
requests. Just to call it out.
These notification may still be necessary, so AM won't add these nodes into 
blacklist if container get killed afterwards. Thoughts?

bq. It might be useful to have a new state called decommissioned_timeout, so 
that admins know the node has been gracefully decommissioned or not.
Just like my above comments, we can see if we have to terminate the NM. If not, 
I prefer to use maintenance state and Admin can decide if to fully 
decommission it later. Again, we should talk on your scenarios here. 

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288746#comment-14288746
 ] 

Rohith commented on YARN-3091:
--

bq. fixing improperly synchronized data accesses in RM scheduler (correctness)
currently findbug exclude xml mask these warnings like IS2_INCONSISTENT_SYNC. I 
believe these exclude lists are reviewed and now assumptions like a class 
expected to be thread-safe. Recently had discussion on this in community 
[Discussion 
thread|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201412.mbox/%3CCALwhT97BqK_zjQ=MCO_c=Y=7r9ewLN2Ab_qm=vqekvxgzrq...@mail.gmail.com%3E]

For identifying 1st level of problems, I think enabling these findbug type 
would help in better way.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288755#comment-14288755
 ] 

Li Lu commented on YARN-3091:
-

I agree we should review the exclude list for potential synchronization 
problems. However note that findbugs uses static analysis to analyze Java 
source code, which may introduce both false positives and false negatives when 
detecting concurrency related bugs. In long term we may want to consider other 
tools to help detect improper synchronization (although a perfect solution 
would be hard). For the short term, I think [~leftnoteasy] raised a very valid 
point (this JIRA) and let's the the problems solved. 

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288348#comment-14288348
 ] 

Hadoop QA commented on YARN-2828:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12693988/YARN-2828.001.patch
  against trunk revision 825923f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6392//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6392//console

This message is automatically generated.

 Enable auto refresh of web pages (using http parameter)
 ---

 Key: YARN-2828
 URL: https://issues.apache.org/jira/browse/YARN-2828
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tim Robertson
Assignee: Vijay Bhat
Priority: Minor
 Attachments: YARN-2828.001.patch


 The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that 
 could be appended to URLs which enabled a page reload.  This was very useful 
 when developing mapreduce jobs, especially to watch counters changing.  This 
 is lost in the the Yarn interface.
 Could be implemented as a page element (e.g. drop down or so), but I'd 
 recommend that the page not be more cluttered, and simply bring back the 
 optional refresh HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288400#comment-14288400
 ] 

Wangda Tan commented on YARN-3091:
--

Since some classes hierarchy are across module (like AbstractYarnScheduler 
inheriented by CS and fair). I suggest to make sub tasks class-family-wise. 
What I proposed for sub tasks are
# AbstractYarnScheduler - CapacityScheduler - FairScheduler
# SchedulerApplicationAttempt - FiCaSchedulerApp - FSAppAttempt
# AbstractCSQueue - ParentQueue - LeafQueue
# AppSchedulingInfo

Hope to get your thoughts on this, if you agree, I will go ahead and create 
sub-tickets.

Thanks,
Wangda

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288449#comment-14288449
 ] 

Wangda Tan commented on YARN-2896:
--

+1 for moving it to YARN-1963

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288460#comment-14288460
 ] 

Varun Saxena commented on YARN-3091:


Ok...

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288411#comment-14288411
 ] 

Varun Saxena commented on YARN-3091:


Similar to YARN-3008 ? Maybe that can be linked to this.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288421#comment-14288421
 ] 

Varun Saxena commented on YARN-3091:


Yeah. I meant YARN-3008 can probably made one subtask of this.
It can address this part :
AbstractYarnScheduler - CapacityScheduler - FairScheduler

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2015-01-22 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288431#comment-14288431
 ] 

Anubhav Dhoot commented on YARN-2868:
-

volatile cannot be used for thread synchronization instead of AtomicLong. Lets 
revert those changes back to previous patch.

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288445#comment-14288445
 ] 

Eric Payne commented on YARN-2896:
--

[~sunilg], [~leftnoteasy], and [~vinodkv], can we move this discussion to 
YARN-1963 in order to achieve a higher visibility?

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler

2015-01-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3092:


 Summary: Create common resource usage class to track labeled 
resource/capacity in Capacity Scheduler
 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


Since we have labels on nodes, so we need to track resource usage *by labels*, 
includes
- AM resource (to enforce max-am-resource-by-label after YARN-2637)
- Used resource (includes AM resource usage)
- Reserved resource
- Pending resource
- Headroom

Benefits to have such a common class are:
- Reuse lots of code in different places (Queue/App/User), better 
maintainability and readability.
- Can make fine-grained locking (e.g. accessing used resource in a queue 
doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3091:


 Summary: [Umbrella] Improve locks of RM scheduler
 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
scheduler
Reporter: Wangda Tan


In existing YARN RM scheduler, there're some issues of using locks. For example:
- Many unnecessary synchronized locks, we have seen several cases recently that 
too frequent access of scheduler makes scheduler hang. Which could be addressed 
by using read/write lock. Components include scheduler, CS queues, apps
- Some fields not properly locked (Like clusterResource)

We can address them together in this ticket.

(More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288412#comment-14288412
 ] 

Varun Saxena commented on YARN-3091:


That one is for FairScheduler

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288417#comment-14288417
 ] 

Wangda Tan commented on YARN-3091:
--

Thanks [~varun_saxena] pointing this, I think such fine-grained locking 
enhancement should also be included in this umbrella ticket. This JIRA is 
intended to track scheduler lock improvements, not only for one special 
scheduler type.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288514#comment-14288514
 ] 

Naganarasimha G R commented on YARN-3009:
-

Hi [~cwensel]
   From the above discussions i could conclude that other options for 
overcoming issue is not currently possible(due to impacts specified  by 
[~zjshen]) and no proper work around too, so i plan to close this issue as wont 
fix, please inform if any issues.


 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch, YARN-3009.20150111-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288803#comment-14288803
 ] 

Tsuyoshi OZAWA commented on YARN-3086:
--

The following is a suggestion from [~rmetzger] in yarn-dev mailing list:
{quote}
If you want to maintain the current value of 4GB as the default value, I
probably need to introduce a new configuration value, similar to:

public static final String YARN_MINICLUSTER_CONTROL_RESOURCE_MONITORING =
YARN_PREFIX + minicluster.control-resource-monitoring;

Or is there another approach you would prefer?

I'll add a patch to the JIRA once I've found time to fix it.
{quote}


 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor

 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288808#comment-14288808
 ] 

Sunil G commented on YARN-2896:
---

Yes. +1.

I will move this to parent JIRA for better visibility. Thank you Eric and 
Wangda for the comments.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288807#comment-14288807
 ] 

Sunil G commented on YARN-2896:
---

Yes. +1.

I will move this to parent JIRA for better visibility. Thank you Eric and 
Wangda for the comments.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288805#comment-14288805
 ] 

Sunil G commented on YARN-2896:
---

Yes. +1.

I will move this to parent JIRA for better visibility. Thank you Eric and 
Wangda for the comments.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288806#comment-14288806
 ] 

Sunil G commented on YARN-2896:
---

Yes. +1.

I will move this to parent JIRA for better visibility. Thank you Eric and 
Wangda for the comments.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288817#comment-14288817
 ] 

Tsuyoshi OZAWA commented on YARN-3086:
--

[~rmetzger] Yeah, the approach you suggested looks good to me basically. We've 
already had YARN_MC_PREFIX, so please use it.

{code}
  public static final String YARN_MC_PREFIX = YARN_PREFIX + minicluster.;
{code}

I think new configuration name should be YARN_MINICLUSTER_NM_PMEM_MB 
straightforwardly.

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor

 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3028) Better syntax for replace label CLI

2015-01-22 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3028:
-
Attachment: 0002-YARN-3028.patch

 Better syntax for replace label CLI
 ---

 Key: YARN-3028
 URL: https://issues.apache.org/jira/browse/YARN-3028
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch


 The command to replace label now is such:
 {code}
 yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]
 {code}
 Instead of {code} node1:port,label1,label2 {code} I think it's better to say 
 {code} node1:port=label1,label2 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3028) Better syntax for replace label CLI

2015-01-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288840#comment-14288840
 ] 

Rohith commented on YARN-3028:
--

Kindly review the updated patch

 Better syntax for replace label CLI
 ---

 Key: YARN-3028
 URL: https://issues.apache.org/jira/browse/YARN-3028
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch


 The command to replace label now is such:
 {code}
 yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]
 {code}
 Instead of {code} node1:port,label1,label2 {code} I think it's better to say 
 {code} node1:port=label1,label2 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288856#comment-14288856
 ] 

Sunil G commented on YARN-3091:
---

As per code review, we can get more issues in these areas mentioned by 
[~leftnoteasy].
I feel a task we can start to run Jcarder to clearly pinpoint the locks 
problems. So it can help us in designing these subtasks with more clarity and 
may be helpful in verifying these changes again.




 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2015-01-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2800:
-
Attachment: YARN-2800-20150122-1.patch

Updated patch.
I just checked code, only RMNodeLabelsManager possibly has multi-thread access 
to NodeLabelsManager, {{nodeLabelsEnabled}} will be protected by write lock of 
CommonNodeLabelsManager. So I think we don't need add volatile to it. And in 
addition, it is only used in CommonsNodeLabelsManager, so make it private. 

Please kindly review.
Thanks,

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, 
 YARN-2800-20150122-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

2015-01-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2694:
-
Attachment: YARN-2694-20150122-1.patch

Updated patch addressed tests failures.

 Ensure only single node labels specified in resource request / host, and node 
 label expression only specified when resourceName=ANY
 ---

 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
 YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
 YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
 YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch


 Currently, node label expression supporting in capacity scheduler is partial 
 completed. Now node label expression specified in Resource Request will only 
 respected when it specified at ANY level. And a ResourceRequest/host with 
 multiple node labels will make user limit, etc. computation becomes more 
 tricky.
 Now we need temporarily disable them, changes include,
 - AMRMClient
 - ApplicationMasterService
 - RMAdminCLI
 - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288082#comment-14288082
 ] 

Wangda Tan commented on YARN-2896:
--

Thanks [~sunilg], I agree to have a storage to simply configuration (no need to 
specify highest-priority for each user under each queue), all can be done via 
REST API or CLI. For other part, we can wait [~vinodkv]'s feedback.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch, 0004-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing

2015-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288099#comment-14288099
 ] 

Hadoop QA commented on YARN-3082:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12693959/YARN-3082.002.patch
  against trunk revision 825923f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6390//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6390//console

This message is automatically generated.

 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing

2015-01-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3082:

Attachment: YARN-3082.002.patch

Addressed feedback

 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2015-01-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288029#comment-14288029
 ] 

Varun Vasudev commented on YARN-160:


bq. HADOOP_HEAPSIZE_MAX in trunk. HADOOP_HEAPSIZE was deprecated.

Thanks for pointing this out Allen. I'll provide a patch for trunk and one for 
branch-2 when I address Vinod's comments.

bq. yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for 
this. On Linux, shouldn't we simply use the the returned numCores if they are 
valid? And fall-back to numProcessors?

Some people prefer to count hyperthreads as a CPU and some don't. This lets 
users choose.

bq. yarn.nodemanager.enable-hardware-capability-detection: I think specifying 
the capabilities to be -1 is already a way to trigger this automatic detection, 
let's simply drop the flag and assume it to be true all the time?

Junping felt we should add it to cover upgrade scenarios. What do you think?

bq. We already have resource.percentage-physical-cpu-limit for CPUs - 
YARN-2440. How about simply adding a resource.percentage-pmem-limit instead 
making it a magic number in the code? Of course, we can have a default reserved 
percentage.

I think resource.percentage-pmem-limit should be analogous to 
resource.percentage-physical-cpu-limit in that it sets the limit as a 
percentage of total memory. What about something like 
yarn.nodemanager.default-percentage-pmem-limit?

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3028) Better syntax for replace label CLI

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288069#comment-14288069
 ] 

Wangda Tan commented on YARN-3028:
--

[~rohithsharma],
Thanks for updating, but did you forget attaching the patch? :-)

 Better syntax for replace label CLI
 ---

 Key: YARN-3028
 URL: https://issues.apache.org/jira/browse/YARN-3028
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3028.patch


 The command to replace label now is such:
 {code}
 yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]
 {code}
 Instead of {code} node1:port,label1,label2 {code} I think it's better to say 
 {code} node1:port=label1,label2 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3049) implement existing ATS queries in the new ATS design

2015-01-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3049:
--
Assignee: Zhijie Shen  (was: Varun Saxena)

 implement existing ATS queries in the new ATS design
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen

 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3040) implement client-side API for handling flows

2015-01-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3040:
--
Assignee: Robert Kanter  (was: Naganarasimha G R)

 implement client-side API for handling flows
 

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287708#comment-14287708
 ] 

Sangjin Lee commented on YARN-2928:
---

Just a reminder that we have an IRC channel for quick discussions on this 
effort at ##hadoop-ats on irc.freenode.net. We also have regular Google hangout 
status calls. Email me if you'd like to participate in the status calls.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

2015-01-22 Thread Robert Metzger (JIRA)
Robert Metzger created YARN-3086:


 Summary: Make NodeManager memory configurable in MiniYARNCluster
 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor


Apache Flink has a build-in YARN client to deploy it to YARN clusters.
Recently, we added more tests for the client, using the MiniYARNCluster.

One of the tests is requesting more containers than available. This test works 
well on machines with enough memory, but on travis-ci (our test environment), 
the available main memory is limited to 3 GB. 
Therefore, I want to set custom amount of memory for each NodeManager.
Right now, the NodeManager memory is hardcoded to 4GB.

As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests

2015-01-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287871#comment-14287871
 ] 

Sandy Ryza commented on YARN-2990:
--

Other than the addition of the anyLocalRequests check core here:
{code}
+  if (offSwitchRequest.getNumContainers()  0 
+  (!anyLocalRequests(priority)
+  || allowedLocality.equals(NodeType.OFF_SWITCH))) {
{code}
are the other changes core to the fix?  If not, given that this is touchy code, 
can we leave things the way they are or make the changes in a separate cleanup 
JIRA?

Also, a couple nits:
* Need some extra indentation in the snippet above
* anyLocalRequests is kind of a confusing name for that method, because any 
often means off-switch when thinking about locality.  Maybe 
hasNodeOrRackRequests.

 FairScheduler's delay-scheduling always waits for node-local and rack-local 
 delays, even for off-rack-only requests
 ---

 Key: YARN-2990
 URL: https://issues.apache.org/jira/browse/YARN-2990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2990-0.patch, yarn-2990-1.patch, 
 yarn-2990-test.patch


 Looking at the FairScheduler, it appears the node/rack locality delays are 
 used for all requests, even those that are only off-rack. 
 More details in comments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287921#comment-14287921
 ] 

Wangda Tan commented on YARN-2800:
--

Thanks [~ozawa] for review! Rebasing and will upload soon.

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error

2015-01-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3088:


 Summary: LinuxContainerExecutor.deleteAsUser can throw NPE if 
native executor returns an error
 Key: YARN-3088
 URL: https://issues.apache.org/jira/browse/YARN-3088
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe


If the native executor returns an error trying to delete a path as a particular 
user when dir==null then the code can NPE trying to build a log message for the 
error.  It blindly deferences dir in the log message despite the code just 
above explicitly handling the cases when dir could be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error

2015-01-22 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-3088:


Assignee: Eric Payne

 LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns 
 an error
 -

 Key: YARN-3088
 URL: https://issues.apache.org/jira/browse/YARN-3088
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Eric Payne

 If the native executor returns an error trying to delete a path as a 
 particular user when dir==null then the code can NPE trying to build a log 
 message for the error.  It blindly deferences dir in the log message despite 
 the code just above explicitly handling the cases when dir could be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser

2015-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288189#comment-14288189
 ] 

Jason Lowe commented on YARN-3089:
--

The failures are unfortunately not present in the NM log due to bug YARN-3088 
preventing the log message from being generated properly.  While debugging an 
instance of the nodemanager, I was able to see the error messages from the LCE 
executable, and they looked like the following:
{noformat}
Directory not found 
/somepath/application_1421927171686_0163/container_1421927171686_0163_01_09/syslog/
rmdir of 
/somepath/application_1421927171686_0163/container_1421927171686_0163_01_09/syslog/
 failed - Permission denied
Directory not found 
/somepath/application_1421927171686_0163/container_1421927171686_0163_01_09/stderr/
rmdir of 
/somepath/application_1421927171686_0163/container_1421927171686_0163_01_09/stderr/
 failed - Permission denied
Directory not found 
/somepath/application_1421927171686_0163/container_1421927171686_0163_01_09/stdout/
rmdir of 
/somepath/application_1421927171686_0163/container_1421927171686_0163_01_09/stdout/
 failed - Permission denied
{noformat}

 LinuxContainerExecutor does not handle file arguments to deleteAsUser
 -

 Key: YARN-3089
 URL: https://issues.apache.org/jira/browse/YARN-3089
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 YARN-2468 added the deletion of individual logs that are aggregated, but this 
 fails to delete log files when the LCE is being used.  The LCE native 
 executable assumes the paths being passed are paths and the delete fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser

2015-01-22 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-3089:


Assignee: Eric Payne

 LinuxContainerExecutor does not handle file arguments to deleteAsUser
 -

 Key: YARN-3089
 URL: https://issues.apache.org/jira/browse/YARN-3089
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Eric Payne
Priority: Blocker

 YARN-2468 added the deletion of individual logs that are aggregated, but this 
 fails to delete log files when the LCE is being used.  The LCE native 
 executable assumes the paths being passed are paths and the delete fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288175#comment-14288175
 ] 

Wangda Tan commented on YARN-2868:
--

[~rchiang],
Just reviewed patch, I'm not sure if you misunderstood what I and Karthik 
meant. I agree with what you mentioned in 
https://issues.apache.org/jira/browse/YARN-2868?focusedCommentId=14274308page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14274308
 (comment#1) and also Karthik's comment: 
https://issues.apache.org/jira/browse/YARN-2868?focusedCommentId=14274317page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14274317.
 It's better to keep AtomicLong as what you originally done. Lock application 
from caller is not clear to me.

Thanks,

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser

2015-01-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3089:


 Summary: LinuxContainerExecutor does not handle file arguments to 
deleteAsUser
 Key: YARN-3089
 URL: https://issues.apache.org/jira/browse/YARN-3089
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker


YARN-2468 added the deletion of individual logs that are aggregated, but this 
fails to delete log files when the LCE is being used.  The LCE native 
executable assumes the paths being passed are paths and the delete fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3028) Better syntax for replace label CLI

2015-01-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288197#comment-14288197
 ] 

Rohith commented on YARN-3028:
--

bq. but did you forget attaching the patch?
 I mean to say for previously attached patch only.

 Better syntax for replace label CLI
 ---

 Key: YARN-3028
 URL: https://issues.apache.org/jira/browse/YARN-3028
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3028.patch


 The command to replace label now is such:
 {code}
 yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]
 {code}
 Instead of {code} node1:port,label1,label2 {code} I think it's better to say 
 {code} node1:port=label1,label2 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3090) DeletionService can silently ignore deletion task failures

2015-01-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3090:


 Summary: DeletionService can silently ignore deletion task failures
 Key: YARN-3090
 URL: https://issues.apache.org/jira/browse/YARN-3090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe


If a non-I/O exception occurs while the DeletionService is executing a deletion 
task then it will be silently ignored.  The exception bubbles up to the thread 
workers of the ScheduledThreadPoolExecutor which simply attaches the throwable 
to the Future that was returned when the task was scheduled.  However the 
thread pool is used as a fire-and-forget pool, so nothing ever looks at the 
Future and therefore the exception is never logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures

2015-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288218#comment-14288218
 ] 

Jason Lowe commented on YARN-3090:
--

An easy way to at least log a message when something terrible occurrs to a 
deletion task is to use a derived ScheduledThreadPoolExecutor that overrides 
the afterExecute method to log any throwable that was associated with the task.

 DeletionService can silently ignore deletion task failures
 --

 Key: YARN-3090
 URL: https://issues.apache.org/jira/browse/YARN-3090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe

 If a non-I/O exception occurs while the DeletionService is executing a 
 deletion task then it will be silently ignored.  The exception bubbles up to 
 the thread workers of the ScheduledThreadPoolExecutor which simply attaches 
 the throwable to the Future that was returned when the task was scheduled.  
 However the thread pool is used as a fire-and-forget pool, so nothing ever 
 looks at the Future and therefore the exception is never logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2015-01-22 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2828:
-
Attachment: YARN-2828.001.patch

 Enable auto refresh of web pages (using http parameter)
 ---

 Key: YARN-2828
 URL: https://issues.apache.org/jira/browse/YARN-2828
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tim Robertson
Assignee: Vijay Bhat
Priority: Minor
 Attachments: YARN-2828.001.patch


 The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that 
 could be appended to URLs which enabled a page reload.  This was very useful 
 when developing mapreduce jobs, especially to watch counters changing.  This 
 is lost in the the Yarn interface.
 Could be implemented as a page element (e.g. drop down or so), but I'd 
 recommend that the page not be more cluttered, and simply bring back the 
 optional refresh HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >