date:20130806


 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1027:


Assignee: Karthik Kambatla  (was: nemon lou)

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1033) Expose RM active/standby state to web UI and metrics

nemon lou created YARN-1033:
---

 Summary: Expose RM active/standby state to web UI and metrics
 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: nemon lou


Both active and standby RM shall expose it's web server and show it's current 
state (active or standby) on web page.
Cluster metrics also need this state for monitor.
RM web services shall refuse client request unless querying for RM state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1033) Expose RM active/standby state to web UI and metrics


 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1033:


Assignee: nemon lou

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: nemon lou
Assignee: nemon lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 RM web services shall refuse client request unless querying for RM state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1033) Expose RM active/standby state to web UI and metrics


 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1033:


Description: 
Both active and standby RM shall expose it's web server and show it's current 
state (active or standby) on web page.
Cluster metrics also need this state for monitor.
Standby RM web services shall refuse client request unless querying for RM 
state.

  was:
Both active and standby RM shall expose it's web server and show it's current 
state (active or standby) on web page.
Cluster metrics also need this state for monitor.
RM web services shall refuse client request unless querying for RM state.


 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: nemon lou
Assignee: nemon lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-06 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730622#comment-13730622
 ] 

Junping Du commented on YARN-1024:
--

I would also prefer #1 in scheduling resources as #2 is only meaningful in 
charge/billing as [~philip] mentioned above. 
For #2, simple calculation like ECU (it is released in 2006/2007, but didn't 
change over 7 years which against Moore's law :)) has two common questioned 
scenarios below:
- assignment of multiple slow p-cores (4 x 1G) to a single thread task (1 x 4G) 
asking for a fast core (mapping to multiple vcore) cannot help performance but 
a waste of cpu resource: unused core will still consume timer interrupts, and 
idle loop cause resources too. In addition, maintaining a consistent memory 
view among multiple vCPUs consume resources. All of these are unnecessary. 
Another case is that it is possible for OS CPU scheduler to migrate a 
single-threaded workload amongst multiple vCPUs, thereby losing cache locality.
- assignment of single faster p-cores (1 x 4G) to multiple thread task asking 
for multiple slow core (4 x 1G), it will cause performance issues as Steve 
mentioned above and in YARN-972, too much overhead in process context switch 
and cache miss.
#1 sounds more reasonable and 1 vcore don't have to be 1pcore, but could be 
mapped to 1 vCPU on virtualization and can be overcommit latter (with 
configured ratio) by virtualized platform.

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-08-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730708#comment-13730708
 ] 

Hudson commented on YARN-1031:
--

SUCCESS: Integrated in Hadoop-Hdfs-0.23-Build #691 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/691/])
YARN-1031. JQuery UI components reference external css in branch-23 (jeagles) 
(jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510775)
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/themes-1.8.16
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/themes-1.8.16/base
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/themes-1.8.16/base/jquery-ui.css


 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Fix For: 0.23.10

 Attachments: YARN-1031-branch-0.23.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-06 Thread Trevor Lorimer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: (was: YARN-696.diff)

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial

 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-06 Thread Trevor Lorimer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call


[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730822#comment-13730822
 ] 

Hadoop QA commented on YARN-696:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596352/YARN-696.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1659//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1659//console

This message is automatically generated.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again


[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730850#comment-13730850
 ] 

Ravi Prakash commented on YARN-90:
--

Do we know what we need to do for this JIRA? I can see in DirectoryCollection, 
we need to be able to remove from failedDirs, and be able to recognize this 
fact in LocalDirsHandler service. Would anything else need to be done?

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi

 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1032) NPE in RackResolve


[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730895#comment-13730895
 ] 

Hadoop QA commented on YARN-1032:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596360/YARN-1032.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1660//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1660//console

This message is automatically generated.

 NPE in RackResolve
 --

 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Priority: Minor
 Attachments: YARN-1032.1.patch, YARN-1032.2.patch


 We found a case where our rack resolve script was not returning rack due to 
 problem with resolving host address. This exception was see in 
 RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
 {noformat}
 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
   at 
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-06 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730961#comment-13730961
 ] 

Eli Collins commented on YARN-1024:
---

bq. vcores are optional anyway (only used in DRF) 

Sandy corrected me offline that while this is true for the CS it is not true 
for the FS, which by default (w/o DRF) will not schedule more containers worth 
of vcores than configured vcores (which seems like it could lead to 
under-utilization given that the default resource calculator only uses memory 
and not every container needs a whole core). By default the # vcores is the # 
cores on the machine and MR asks containers w/ 1 vcore so we effectively have 
vcore=pcore today as the default (re-inforced by the decision to remove the 
notion of pcore in YARN-782).

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1034) Remove experimental in the Fair Scheduler documentation

2013-08-06 Thread Sandy Ryza (JIRA)

Sandy Ryza created YARN-1034:


 Summary: Remove experimental in the Fair Scheduler documentation
 Key: YARN-1034
 URL: https://issues.apache.org/jira/browse/YARN-1034
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, scheduler
Affects Versions: 2.1.0-beta
 Environment: The YARN Fair Scheduler is largely stable now, and should 
no longer be declared experimental.
Reporter: Sandy Ryza
Assignee: Karthik Kambatla




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-291) Dynamic resource configuration

[
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731037#comment-13731037
]

Alejandro Abdelnur commented on YARN-291:
-

Are we talking about an admin call to the RM that would set a resources
correction on per node basis and the RM would adjust the NM reported resource
capacity based on this correction? This would not require changes in the NMs.
And potentially the correction could be done on the node update event before it
makes to the scheduler impl, thus transparent to the scheduler impl. And if we
want to persist these corrections, this could be done by the RM itself.

If I got things right I'm OK with the approach.

Dynamic resource configuration
--

Key: YARN-291
URL: https://issues.apache.org/jira/browse/YARN-291
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Labels: features
Attachments: Elastic Resources for YARN-v0.2.pdf,
YARN-291-AddClientRMProtocolToSetNodeResource-03.patch,
YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch,
YARN-291-JMXInterfaceOnNM-02.patch,
YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch,
YARN-291-YARNClientCommandline-04.patch

The current Hadoop YARN resource management logic assumes per node resource
is static during the lifetime of the NM process. Allowing run-time
configuration on per node resource will give us finer granularity of resource
elasticity. This allows Hadoop workloads to coexist with other workloads on
the same hardware efficiently, whether or not the environment is virtualized.
More background and design details can be found in attached proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned YARN-160:
---

Assignee: (was: Alejandro Abdelnur)

I have my hands full at the moment, I won't be able to take onto this one for a 
while. 

Making it unassigned in case somebody wants to take a stab to it.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
 Fix For: 2.1.0-beta


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

2013-08-06 Thread Joseph Kniest (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731053#comment-13731053
 ] 

Joseph Kniest commented on YARN-1019:
-

Hi, new to yarn, where do I look in the code base for this?

 YarnConfiguration validation for local disk path and http addresses.
 

 Key: YARN-1019
 URL: https://issues.apache.org/jira/browse/YARN-1019
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Omkar Vinit Joshi
Priority: Minor
  Labels: newbie

 Today we are not validating certain configuration parameters set in 
 yarn-site.xml. 1) Configurations related to paths... such as local-dirs, 
 log-dirs.. Our NM crashes during startup if they are set to relative paths 
 rather than absolute paths. To avoid such failures we can enforce checks 
 (absolute paths) before startup . i.e. before we actually startup...( i.e. 
 directory handler creating directories).
 2) Also for all the parameters using hostname:port unless we are ok with 
 default port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

[
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731060#comment-13731060
]

Omkar Vinit Joshi commented on YARN-1019:
-

Hi, Welcome to yarn group. Probably you can get started from here [Checkout
Code|http://wiki.apache.org/hadoop/HowToContribute]. Subscribe to user / dev
mailing list and ask questions there (General questions such as how to checkout
code/ issues running code). Here we usually discuss the current issue related
problems. To get started. Run YARN..simple map reduce program. Once you are
familiar with this you can take up one of the tickets marked as newbie and
start working on that.

YarnConfiguration validation for local disk path and http addresses.

Key: YARN-1019
URL: https://issues.apache.org/jira/browse/YARN-1019
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Omkar Vinit Joshi
Priority: Minor
Labels: newbie

Today we are not validating certain configuration parameters set in
yarn-site.xml. 1) Configurations related to paths... such as local-dirs,
log-dirs.. Our NM crashes during startup if they are set to relative paths
rather than absolute paths. To avoid such failures we can enforce checks
(absolute paths) before startup . i.e. before we actually startup...( i.e.
directory handler creating directories).
2) Also for all the parameters using hostname:port unless we are ok with
default port.

[jira] [Commented] (YARN-1035) NPE when trying to create an error message response of RPC

2013-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731063#comment-13731063
 ] 

Steve Loughran commented on YARN-1035:
--

{code}
8]] INFO  DataNode.clienttrace (BlockSender.java:sendBlock(695)) - src: 
/127.0.0.1:58247, dest: /127.0.0.1:58308, bytes: 5439, op: HDFS_READ, cliID: 
DFSClient_NONMAPREDUCE_-539248485_697, offset: 0, srvID: 
DS-502087106-10.11.3.237-58247-1375813762260, blockid: 
BP-384257351-10.11.3.237-1375813760919:blk_1073741832_1008, duration: 293000
2013-08-06 11:29:30,802 [IPC Server handler 1 on 58224] INFO  
localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource 
hdfs://localhost:58246/user/stevel/.hoya/cluster/TestLiveRegionService/generated/hbase-env.sh
 transitioned from DOWNLOADING to LOCALIZED
2013-08-06 11:29:30,802 [AsyncDispatcher event handler] INFO  
container.Container (ContainerImpl.java:handle(860)) - Container 
container_1375813755119_0001_01_02 transitioned from LOCALIZING to LOCALIZED
2013-08-06 11:29:30,921 [AsyncDispatcher event handler] INFO  
container.Container (ContainerImpl.java:handle(860)) - Container 
container_1375813755119_0001_01_02 transitioned from LOCALIZED to RUNNING
2013-08-06 11:29:31,140 [ContainersLauncher #0] INFO  
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(189)) - launchContainer: [nice, 
-n, 0, bash, -c, 
/Users/stevel/Projects/Hortonworks/Projects/hoya/target/TestLiveRegionService/TestLiveRegionService-localDir-nm-0_0/usercache/stevel/appcache/application_1375813755119_0001/container_1375813755119_0001_01_02/default_container_executor.sh]
2013-08-06 11:29:31,169 [ProcessThread(sid:0 cport:-1):] INFO  
server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(627)) - Got 
user-level KeeperException when processing sessionid:0x14054e3f67f0001 
type:delete cxid:0x13 zxid:0xc txntype:-1 reqpath:n/a Error 
Path:/yarnapps_hoya_stevel_TestLiveRegionService/backup-masters/10.11.3.237,58296,1375813768541
 Error:KeeperErrorCode = NoNode for 
/yarnapps_hoya_stevel_TestLiveRegionService/backup-masters/10.11.3.237,58296,1375813768541
2013-08-06 11:29:31,713 [Socket Reader #1 for port 58246] INFO  ipc.Server 
(Server.java:doRead(800)) - IPC Server listener on 58246: readAndProcess from 
client 127.0.0.1 threw exception [java.lang.NullPointerException]
java.lang.NullPointerException
at 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$Builder.setErrorMsg(RpcHeaderProtos.java:1843)
at org.apache.hadoop.ipc.Server.setupResponse(Server.java:2330)
at org.apache.hadoop.ipc.Server.access$2900(Server.java:121)
at org.apache.hadoop.ipc.Server$Connection.doSaslReply(Server.java:1430)
at 
org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1548)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1507)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:791)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:590)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:565)
2013-08-06 11:29:31,729 [Socket Reader #1 for port 58246] INFO  ipc.Server 
(Server.java:doRead(800)) - IPC Server listener on 58246: readAndProcess from 
client 127.0.0.1 threw exception [java.lang.NullPointerException]
java.lang.NullPointerException
at 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$Builder.setErrorMsg(RpcHeaderProtos.java:1843)
at org.apache.hadoop.ipc.Server.setupResponse(Server.java:2330)
at org.apache.hadoop.ipc.Server.access$2900(Server.java:121)
at org.apache.hadoop.ipc.Server$Connection.doSaslReply(Server.java:1430)
at 
org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1548)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1507)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:791)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:590)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:565)
2013-08-06 11:29:32,070 [ProcessThread(sid:0 cport:-1):] INFO  
server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest2Txn(476)) - 
Processed session termination for sessionid: 0x14054e3f67f0001
{code}

 NPE when trying to create an error message response of RPC
 --

 Key: YARN-1035
 URL: https://issues.apache.org/jira/browse/YARN-1035
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Steve Loughran

 I'm seeing an NPE which is raised when the server is trying to create an 
 error response to send back to the caller and there is no error text.
 The root

[jira] [Created] (YARN-1035) NPE when trying to create an error message response of RPC

2013-08-06 Thread Steve Loughran (JIRA)

Steve Loughran created YARN-1035:


 Summary: NPE when trying to create an error message response of RPC
 Key: YARN-1035
 URL: https://issues.apache.org/jira/browse/YARN-1035
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Steve Loughran


I'm seeing an NPE which is raised when the server is trying to create an error 
response to send back to the caller and there is no error text.

The root cause is probably somewhere in SASL, but sending something back to the 
caller would seem preferable to NPE-ing server-side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1035) NPE when trying to create an error message response of RPC

2013-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731064#comment-13731064
 ] 

Steve Loughran commented on YARN-1035:
--

Looking up the stack, it's in
{code}

private void doSaslReply(Exception ioe) throws IOException {
  setupResponse(authFailedResponse, authFailedCall,
  RpcStatusProto.FATAL, RpcErrorCodeProto.FATAL_UNAUTHORIZED,
  null, ioe.getClass().getName(), ioe.getLocalizedMessage());
  responder.doRespond(authFailedCall);
}

{code}

This code assumes that the {{ioe.getLocalizedMessage()}} always returns a 
non-null string. Some exceptions do return null. For a robust response, 
{{ioe.toString()}} should be used.

 NPE when trying to create an error message response of RPC
 --

 Key: YARN-1035
 URL: https://issues.apache.org/jira/browse/YARN-1035
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Steve Loughran

 I'm seeing an NPE which is raised when the server is trying to create an 
 error response to send back to the caller and there is no error text.
 The root cause is probably somewhere in SASL, but sending something back to 
 the caller would seem preferable to NPE-ing server-side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

2013-08-06 Thread Joseph Kniest (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731106#comment-13731106
 ] 

Joseph Kniest commented on YARN-1019:
-

Thanks, I've done all that, built the latest from source, kicked off sample 
mapreduce job, looking for where this is handled in the code 

 YarnConfiguration validation for local disk path and http addresses.
 

 Key: YARN-1019
 URL: https://issues.apache.org/jira/browse/YARN-1019
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Omkar Vinit Joshi
Priority: Minor
  Labels: newbie

 Today we are not validating certain configuration parameters set in 
 yarn-site.xml. 1) Configurations related to paths... such as local-dirs, 
 log-dirs.. Our NM crashes during startup if they are set to relative paths 
 rather than absolute paths. To avoid such failures we can enforce checks 
 (absolute paths) before startup . i.e. before we actually startup...( i.e. 
 directory handler creating directories).
 2) Also for all the parameters using hostname:port unless we are ok with 
 default port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-985) Nodemanager should log where a resource was localized


 [ 
https://issues.apache.org/jira/browse/YARN-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-985:
--

Attachment: YARN-985.branch-0.23.patch

For branch-0.23


 Nodemanager should log where a resource was localized
 -

 Key: YARN-985
 URL: https://issues.apache.org/jira/browse/YARN-985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-985.branch-0.23.patch, YARN-985.patch, 
 YARN-985.patch


 When a resource is localized, we should log WHERE on the local disk it was 
 localized. This helps in debugging afterwards (e.g. if the disk was to go 
 bad).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-985) Nodemanager should log where a resource was localized


 [ 
https://issues.apache.org/jira/browse/YARN-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-985:
--

Attachment: YARN-985.patch

This is for trunk. I've incorporated Omkar's suggestion now

 Nodemanager should log where a resource was localized
 -

 Key: YARN-985
 URL: https://issues.apache.org/jira/browse/YARN-985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-985.branch-0.23.patch, YARN-985.patch, 
 YARN-985.patch


 When a resource is localized, we should log WHERE on the local disk it was 
 localized. This helps in debugging afterwards (e.g. if the disk was to go 
 bad).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

[
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731120#comment-13731120
]

Omkar Vinit Joshi commented on YARN-1019:
-

Start with YarnConfiguration.java. track all the places from where it is
getting used and fix all path related and host:port checks. Also once done
upload a patch. Someone will take a look at it. Make sure your patch file is
something like jira-number-date-in--mm-dd.number.patch format. It
will help reviewers. Also make sure your code is formatted well. Make sure your
changes are as minimal as possible. You are set then. Start contributing!!

YarnConfiguration validation for local disk path and http addresses.

Key: YARN-1019
URL: https://issues.apache.org/jira/browse/YARN-1019
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Omkar Vinit Joshi
Priority: Minor
Labels: newbie

[jira] [Commented] (YARN-985) Nodemanager should log where a resource was localized


[ 
https://issues.apache.org/jira/browse/YARN-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731125#comment-13731125
 ] 

Omkar Vinit Joshi commented on YARN-985:


+1

 Nodemanager should log where a resource was localized
 -

 Key: YARN-985
 URL: https://issues.apache.org/jira/browse/YARN-985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-985.branch-0.23.patch, YARN-985.patch, 
 YARN-985.patch


 When a resource is localized, we should log WHERE on the local disk it was 
 localized. This helps in debugging afterwards (e.g. if the disk was to go 
 bad).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

[
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731136#comment-13731136
]

Alejandro Abdelnur commented on YARN-1008:
--

[~vinodkv], I don't think the change should go beyond minicluster for the
following reason as in a real cluster there is one NM per node. Said this,
maybe what we should do is that AMs should be able to specify a HOST:PORT
(which typically will be the DN HOST:PORT), in the case of Minicluster, we
would need a mapping between DN HOST:PORT to NM HOST:PORT when processing the
resource request. We should also support directly HOST:PORT without mapping for
cases where MiniHDFS is not there.

[~ojoshi], multiple NMs register with its nodeId which contains HOST:PORT, so
you have multiple nodes in the minicluster. But the scheduler logic, in all
schedulers, use the node.getHost() to do the scheduling, that is why you see it
working fine, all nodes report the same host. The problem is, you have no
control on which NM you get.

The challenge is how do we get this to work nicely in minicluster and real
setups without disruption.

MiniYARNCluster with multiple nodemanagers, all nodes have same key for
allocations
---

Key: YARN-1008
URL: https://issues.apache.org/jira/browse/YARN-1008
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

While the NMs are keyed using the NodeId, the allocation is done based on the
hostname.
This makes the different nodes indistinguishable to the scheduler.
There should be an option to enabled the host:port instead just port for
allocations. The nodes reported to the AM should report the 'key' (host or
host:port).

[jira] [Commented] (YARN-985) Nodemanager should log where a resource was localized

2013-08-06 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731166#comment-13731166
 ] 

Jonathan Eagles commented on YARN-985:
--

Looks like we are all happy. Putting this in. Thanks, everybody.

 Nodemanager should log where a resource was localized
 -

 Key: YARN-985
 URL: https://issues.apache.org/jira/browse/YARN-985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-985.branch-0.23.patch, YARN-985.patch, 
 YARN-985.patch


 When a resource is localized, we should log WHERE on the local disk it was 
 localized. This helps in debugging afterwards (e.g. if the disk was to go 
 bad).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-985) Nodemanager should log where a resource was localized

2013-08-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731212#comment-13731212
 ] 

Hudson commented on YARN-985:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4221 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4221/])
YARN-985. Nodemanager should log where a resource was localized (Ravi Prakash 
via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1511100)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java


 Nodemanager should log where a resource was localized
 -

 Key: YARN-985
 URL: https://issues.apache.org/jira/browse/YARN-985
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 3.0.0, 2.3.0, 0.23.10

 Attachments: YARN-985.branch-0.23.patch, YARN-985.patch, 
 YARN-985.patch


 When a resource is localized, we should log WHERE on the local disk it was 
 localized. This helps in debugging afterwards (e.g. if the disk was to go 
 bad).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler

[
https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731249#comment-13731249
]

Alejandro Abdelnur commented on YARN-1004:
--

bq. Isn't it simpler for FS to ignore the existing configs?

It is simpler, but it is not correct. it will create confusion due to
misconfigurations when moving from one scheduler to another (either way).

yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler

Key: YARN-1004
URL: https://issues.apache.org/jira/browse/YARN-1004
Project: Hadoop YARN
Issue Type: Improvement
Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Priority: Blocker
Attachments: YARN-1004.patch

As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific
configuration, and functions differently for the Fair and Capacity
schedulers, it would be less confusing for the config names to include the
scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb,
yarn.scheduler.capacity.minimum-allocation-mb, and
yarn.scheduler.fifo.minimum-allocation-mb.
The same goes for yarn.scheduler.increment-allocation-mb, which only exists
for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for
consistency.
If we wish to preserve backwards compatibility, we can deprecate the old
configs to the new ones.

[jira] [Updated] (YARN-589) Expose a REST API for monitoring the fair scheduler

2013-08-06 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-589:


Attachment: YARN-589-2.patch

 Expose a REST API for monitoring the fair scheduler
 ---

 Key: YARN-589
 URL: https://issues.apache.org/jira/browse/YARN-589
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: fairscheduler.xml, YARN-589-1.patch, YARN-589-2.patch, 
 YARN-589.patch


 The fair scheduler should have an HTTP interface that exposes information 
 such as applications per queue, fair shares, demands, current allocations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-08-06 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021-images.tar.gz
YARN-1021-demo.tar.gz
YARN-1021.pdf

YARN-1021.pdf: simulator documentation.
YARN-1021-demo.tar.gz: configuration (for YARN) and data used for a demo
running.
YARN-1021-images.tar.gz: images used by simulator site document.

Yarn Scheduler Load Simulator
-

Key: YARN-1021
URL: https://issues.apache.org/jira/browse/YARN-1021
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz,
YARN-1021.pdf

The Yarn Scheduler is a fertile area of interest with different
implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile,
several optimizations are also made to improve scheduler performance for
different scenarios and workload. Each scheduler algorithm has its own set of
features, and drives scheduling decisions by many factors, such as fairness,
capacity guarantee, resource availability, etc. It is very important to
evaluate a scheduler algorithm very well before we deploy it in a production
cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling
algorithm. Evaluating in a real cluster is always time and cost consuming,
and it is also very hard to find a large-enough cluster. Hence, a simulator
which can predict how well a scheduler algorithm for some specific workload
would be quite useful.
We want to build a Scheduler Load Simulator to simulate large-scale Yarn
clusters and application loads in a single machine. This would be invaluable
in furthering Yarn by providing a tool for researchers and developers to
prototype new scheduler features and predict their behavior and performance
with reasonable amount of confidence, there-by aiding rapid innovation.
The simulator will exercise the real Yarn ResourceManager removing the
network factor by simulating NodeManagers and ApplicationMasters via handling
and dispatching NM/AMs heartbeat events from within the same JVM.
To keep tracking of scheduler behavior and performance, a scheduler wrapper
will wrap the real scheduler.
The simulator will produce real time metrics while executing, including:
* Resource usages for whole cluster and each queue, which can be utilized to
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated
time), which can be analyzed to understand/validate the scheduler behavior
(individual jobs turn around time, throughput, fairness, capacity guarantee,
etc).
* Several key metrics of scheduler algorithm, such as time cost of each
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop
developers to find the code spots and scalability limits.
The simulator will provide real time charts showing the behavior of the
scheduler and its performance.

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-08-06 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-1021:
--

Description:
The Yarn Scheduler is a fertile area of interest with different
implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several
optimizations are also made to improve scheduler performance for different
scenarios and workload. Each scheduler algorithm has its own set of features,
and drives scheduling decisions by many factors, such as fairness, capacity
guarantee, resource availability, etc. It is very important to evaluate a
scheduler algorithm very well before we deploy it in a production cluster.
Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm.
Evaluating in a real cluster is always time and cost consuming, and it is also
very hard to find a large-enough cluster. Hence, a simulator which can predict
how well a scheduler algorithm for some specific workload would be quite useful.

We want to build a Scheduler Load Simulator to simulate large-scale Yarn
clusters and application loads in a single machine. This would be invaluable in
furthering Yarn by providing a tool for researchers and developers to prototype
new scheduler features and predict their behavior and performance with
reasonable amount of confidence, there-by aiding rapid innovation.

The simulator will exercise the real Yarn ResourceManager removing the network
factor by simulating NodeManagers and ApplicationMasters via handling and
dispatching NM/AMs heartbeat events from within the same JVM.

To keep tracking of scheduler behavior and performance, a scheduler wrapper
will wrap the real scheduler.

The simulator will produce real time metrics while executing, including:

* Resource usages for whole cluster and each queue, which can be utilized to
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated
time), which can be analyzed to understand/validate the scheduler behavior
(individual jobs turn around time, throughput, fairness, capacity guarantee,
etc).
* Several key metrics of scheduler algorithm, such as time cost of each
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop
developers to find the code spots and scalability limits.

The simulator will provide real time charts showing the behavior of the
scheduler and its performance.

A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing
how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

was:
The Yarn Scheduler is a fertile area of interest with different
implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several
optimizations are also made to improve scheduler performance for different
scenarios and workload. Each scheduler algorithm has its own set of features,
and drives scheduling decisions by many factors, such as fairness, capacity
guarantee, resource availability, etc. It is very important to evaluate a
scheduler algorithm very well before we deploy it in a production cluster.
Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm.
Evaluating in a real cluster is always time and cost consuming, and it is also
very hard to find a large-enough cluster. Hence, a simulator which can predict
how well a scheduler algorithm for some specific workload would be quite useful.

To keep tracking of scheduler behavior and performance, a scheduler wrapper
will wrap the real scheduler.

The simulator will produce real time metrics while executing, including:

The simulator will provide real time charts showing the behavior of the
scheduler and its performance.

Yarn Scheduler Load Simulator

[jira] [Commented] (YARN-589) Expose a REST API for monitoring the fair scheduler


[ 
https://issues.apache.org/jira/browse/YARN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731436#comment-13731436
 ] 

Hadoop QA commented on YARN-589:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596445/YARN-589-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1662//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1662//console

This message is automatically generated.

 Expose a REST API for monitoring the fair scheduler
 ---

 Key: YARN-589
 URL: https://issues.apache.org/jira/browse/YARN-589
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: fairscheduler.xml, YARN-589-1.patch, YARN-589-2.patch, 
 YARN-589.patch


 The fair scheduler should have an HTTP interface that exposes information 
 such as applications per queue, fair shares, demands, current allocations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.

2013-08-06 Thread Joseph Kniest (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731462#comment-13731462
]

Joseph Kniest commented on YARN-1019:
-

Ok so this module YarnConfiguration, does other portions of the codebase access
this for the config info like directories and what not and I need to find all
those places? How does that information get passed to this object? Ultimately,
we want to find where this object gets instantiated and ensure that it doesn't
get relative paths correct? What exactly do we want with number 2 of this
issue? I'm confused about that one

YarnConfiguration validation for local disk path and http addresses.

Key: YARN-1019
URL: https://issues.apache.org/jira/browse/YARN-1019
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Omkar Vinit Joshi
Priority: Minor
Labels: newbie

[jira] [Created] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

Ravi Prakash created YARN-1036:
--

 Summary: Distributed Cache gives inconsistent result if cache 
files get deleted from task tracker 
 Key: YARN-1036
 URL: https://issues.apache.org/jira/browse/YARN-1036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash


This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because 
that one had been closed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker


 [ 
https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-1036:
---

Attachment: YARN-1036.branch-0.23.patch

This is exactly the same patch as MAPREDUCE-4342.

 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: YARN-1036
 URL: https://issues.apache.org/jira/browse/YARN-1036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-1036.branch-0.23.patch


 This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because 
 that one had been closed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1010) FairScheduler: decouple container scheduling from nodemanager heartbeats

2013-08-06 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-1010:
-

Assignee: Wei Yan  (was: Alejandro Abdelnur)

 FairScheduler: decouple container scheduling from nodemanager heartbeats
 

 Key: YARN-1010
 URL: https://issues.apache.org/jira/browse/YARN-1010
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Wei Yan
Priority: Critical

 Currently scheduling for a node is done when a node heartbeats.
 For large cluster where the heartbeat interval is set to several seconds this 
 delays scheduling of incoming allocations significantly.
 We could have a continuous loop scanning all nodes and doing scheduling. If 
 there is availability AMs will get the allocation in the next heartbeat after 
 the one that placed the request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker


[ 
https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731470#comment-13731470
 ] 

Hadoop QA commented on YARN-1036:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12596459/YARN-1036.branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1664//console

This message is automatically generated.

 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: YARN-1036
 URL: https://issues.apache.org/jira/browse/YARN-1036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-1036.branch-0.23.patch


 This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because 
 that one had been closed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731474#comment-13731474
]

Hadoop QA commented on YARN-1021:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12596449/YARN-1021.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified test files.

{color:red}-1 javac{color}. The applied patch generated 1163 javac
compiler warnings (more than the trunk's current 1147 warnings).

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 28 new
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}. The applied patch generated 7
release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1663//testReport/
Release audit warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/1663//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/1663//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-sls.html
Javac warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/1663//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1663//console

This message is automatically generated.

Yarn Scheduler Load Simulator
-

--
This message is automatically generated by JIRA.
If you think it was

[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker


[ 
https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731473#comment-13731473
 ] 

Omkar Vinit Joshi commented on YARN-1036:
-

Thanks [~raviprak]
Probably we need to isolate the logic for LOCALIZED and REQUEST scenarios? 
thoughts?
{code}
+  if (rsrc != null  (!isResourcePresent(rsrc))) {
+LOG.info(Resource  + rsrc.getLocalPath()
++  is missing, localizing it again);
+localrsrc.remove(req);
+rsrc = null;
+  }
{code}
the code is not required to be executed when a resource is getting LOCALIZED.. 
in trunk we have isolated them. Probably as in branch 0.23 we don't have 
anything like localCacheDirectoryManager it makes sense to just keep 
break...and do nothing in case it is LOCALIZED?
{code}
case LOCALIZED: break;
case REQUEST:
+  if (rsrc != null  (!isResourcePresent(rsrc))) {
+LOG.info(Resource  + rsrc.getLocalPath()
++  is missing, localizing it again);
+localrsrc.remove(req);
+rsrc = null;
+  }
.
{code}
didn't review the test code.

 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: YARN-1036
 URL: https://issues.apache.org/jira/browse/YARN-1036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: YARN-1036.branch-0.23.patch


 This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because 
 that one had been closed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-08-06 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-353:
--

Attachment: YARN-353.11.patch

Manually inspected the fields findbugs is complaining about - don't see any 
particular issues or additional need for synchronization. 

Uploading patch that adds exclusions for the two fields in question.


 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, 
 YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, 
 YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-08-06 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731480#comment-13731480
 ] 

Karthik Kambatla commented on YARN-353:
---

YARN-353.11.patch is the patch with findbugs exclusions.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, 
 YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, 
 YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731528#comment-13731528
 ] 

Hadoop QA commented on YARN-353:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596465/YARN-353.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1665//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1665//console

This message is automatically generated.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, 
 YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, 
 YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1019) YarnConfiguration validation for local disk path and http addresses.