[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729294#comment-13729294
 ] 

Karthik Kambatla commented on YARN-1027:


[~nemon], if you haven't started work on this already, do you mind if I take 
this up? I have been discussing this with Bikas on YARN-149 and offline and 
started working on.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: nemon lou

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-1026) Test and verify ACL based ZKRMStateStore fencing for RM State Store

2013-08-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1026:
--

Assignee: Karthik Kambatla

 Test and verify ACL based ZKRMStateStore fencing for RM State Store
 ---

 Key: YARN-1026
 URL: https://issues.apache.org/jira/browse/YARN-1026
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 ZooKeeper allows create/delete ACL's for immediate children of a znode. It 
 also has admin ACL's on a znode that allow changing the create/delete ACL's 
 on that znode. RM instances could share the admin ACL's on the state store 
 root znode. When an RM transitions to active, it can use the shared admin 
 ACL's to give itself exclusive create/delete permissions on the children of 
 the root znode. If all ZK state store operations are atomic and involve a 
 create/delete znode then the above effectively fences other RM instances from 
 modifying the store. This ACL change is only allowed when transitioning to 
 active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-1029) Allow embedding leader election into the RM

2013-08-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1029:
--

Assignee: Karthik Kambatla

 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 It should be possible to embed common ActiveStandyElector into the RM such 
 that ZooKeeper based leader election and notification is in-built. In 
 conjunction with a ZK state store, this configuration will be a simple 
 deployment option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-05 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729348#comment-13729348
 ] 

nemon lou commented on YARN-1027:
-

I have also started working on this since it was in unassigned.
It's ok to take it up,i will review the patch :)


 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: nemon lou

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2013-08-05 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729530#comment-13729530
 ] 

Timothy St. Clair commented on YARN-160:


I think the prudent approach would be to evaluate hwloc and its community, and 
determine if it meets the internal needs of YARN.  For risk mitigation 
purposes, I think having a plugin abstraction layer as a fallback would also be 
wise. 

I did notice there are also java bindings around hwloc 
(https://launchpad.net/jhwloc/) 

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.1.0-beta


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1025) NodeManager does not propagate java.library.path to launched child containers on Windows

2013-08-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729549#comment-13729549
 ] 

Kihwal Lee commented on YARN-1025:
--

On Linux, use of LD_LIBRARY_PATH is better because it's easier to manipulate 
(e.g. path munging) and offers a better search and error handling. When 
java.library.path is set, jvm tries to load the first match. If it fails, the 
failure is permanent. I.e. no further search is done. This is unacceptable if 
the search paths contain libraries for multiple architectures (e.g. 32 bit and 
64 bit). When LD_LIBRARY_PATH is used exclusively, the system loader is in 
charge and it does a much better job.  I believe the behavior is similar on 
Windows with PATH.

 NodeManager does not propagate java.library.path to launched child containers 
 on Windows
 

 Key: YARN-1025
 URL: https://issues.apache.org/jira/browse/YARN-1025
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chris Nauroth

 Neither the NodeManager process itself nor the child container processes that 
 it launches have the correct setting for java.library.path on Windows.  This 
 prevents the processes from loading native code from hadoop.dll.  The native 
 code is required for correct functioning on Windows (not optional), so this 
 ultimately can cause failures in MapReduce jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-08-05 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Description: 
The Yarn Scheduler is a fertile area of interest with different 
implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, several 
optimizations are also made to improve scheduler performance for different 
scenarios and workload. Each scheduler algorithm has its own set of features, 
and drives scheduling decisions by many factors, such as fairness, capacity 
guarantee, resource availability, etc. It is very important to evaluate a 
scheduler algorithm very well before we deploy it in a production cluster. 
Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. 
Evaluating in a real cluster is always time and cost consuming, and it is also 
very hard to find a large-enough cluster. Hence, a simulator which can predict 
how well a scheduler algorithm for some specific workload would be quite useful.

We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
clusters and application loads in a single machine. This would be invaluable in 
furthering Yarn by providing a tool for researchers and developers to prototype 
new scheduler features and predict their behavior and performance with 
reasonable amount of confidence, there-by aiding rapid innovation.

The simulator will exercise the real Yarn ResourceManager removing the network 
factor by simulating NodeManagers and ApplicationMasters via handling and 
dispatching NM/AMs heartbeat events from within the same JVM.

To keep tracking of scheduler behavior and performance, a scheduler wrapper 
will wrap the real scheduler.

The simulator will produce real time metrics while executing, including:

* Resource usages for whole cluster and each queue, which can be utilized to 
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated 
time), which can be analyzed to understand/validate the  scheduler behavior 
(individual jobs turn around time, throughput, fairness, capacity guarantee, 
etc).
* Several key metrics of scheduler algorithm, such as time cost of each 
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
developers to find the code spots and scalability limits.

The simulator will provide real time charts showing the behavior of the 
scheduler and its performance.

  was:
The Yarn Scheduler is a fertile area of interest with different 
implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, several 
optimizations are also made to improve scheduler performance for different 
scenarios and workload. Each scheduler algorithm has its own set of features, 
and drives scheduling decisions by many factors, such as fairness, capacity 
guarantee, resource availability, etc. It is very important to evaluate a 
scheduler algorithm very well before we deploy it in a production cluster. 
Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. 
Evaluating in a real cluster is always time and cost consuming, and it is also 
very hard to find a large-enough cluster. Hence, a simulator which can predict 
how well a scheduler algorithm for some specific workload would be quite useful.

We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
clusters and application loads in a single machine. This would be invaluable in 
furthering Yarn by providing a tool for researchers and developers to prototype 
new scheduler features and predict their behavior and performance with 
reasonable amount of confidence, there-by aiding rapid innovation.

The simulator will exercise the real Yarn ResourceManager removing the network 
factor by simulating NodeManagers and ApplicationManagers via handling and 
dispatching NM/AMs heartbeat events from within the same JVM.

To keep tracking of scheduler behavior and performance, a scheduler wrapper 
will wrap the real scheduler.

The simulator will produce real time metrics while executing, including:

* Resource usages for whole cluster and each queue, which can be utilized to 
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated 
time), which can be analyzed to understand/validate the  scheduler behavior 
(individual jobs turn around time, throughput, fairness, capacity guarantee, 
etc).
* Several key metrics of scheduler algorithm, such as time cost of each 
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
developers to find the code spots and scalability limits.

The simulator will provide real time charts showing the behavior of the 
scheduler and its performance.


 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: 

[jira] [Updated] (YARN-975) Adding HDFS implementation for grouped reading and writing interfaces of history storage

2013-08-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-975:
-

Attachment: YARN-975.4.patch

Updated the patch according to YARN-1007

 Adding HDFS implementation for grouped reading and writing interfaces of 
 history storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729724#comment-13729724
 ] 

Arun C Murthy commented on YARN-1024:
-

bq. If I were to package my simulator and give it to other people on other 
clusters, it would still be true that it spins one CPU. Its runtime, however, 
would vary depending on the horsepower.

I don't see the conflict.

If you don't care about predictable runtime, you could still say I want to run 
on 1 virtual-core. By the above non-requirement on predictability, whether it's 
1 (virtual) core out of 16 physical cores or 1024 virtual cores is immaterial, 
isn't it? And yes, you still get only 1 physical core since the virtual core is 
mapped to a single physical core.

The point about specifying a virtual core is that you get predictable 
performance when you migrate your application between clusters and other 
goodness.

What am I missing here?

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-975) Adding HDFS implementation for grouped reading and writing interfaces of history storage

2013-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729730#comment-13729730
 ] 

Hadoop QA commented on YARN-975:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596159/YARN-975.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1656//console

This message is automatically generated.

 Adding HDFS implementation for grouped reading and writing interfaces of 
 history storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-05 Thread Trevor Lorimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-05 Thread Trevor Lorimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: (was: 0001-YARN-696.patch)

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data

2013-08-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-953:
-

Attachment: YARN-953.3.patch

Update the patch against the latest branch code, and add the code to init/start 
and stop the writer in case it is a service

 [YARN-321] Change ResourceManager to use HistoryStorage to log history data
 ---

 Key: YARN-953
 URL: https://issues.apache.org/jira/browse/YARN-953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-953.1.patch, YARN-953.2.patch, YARN-953.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-05 Thread Trevor Lorimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-05 Thread Trevor Lorimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: (was: YARN-696.diff)

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data

2013-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729785#comment-13729785
 ] 

Hadoop QA commented on YARN-953:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596169/YARN-953.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1658//console

This message is automatically generated.

 [YARN-321] Change ResourceManager to use HistoryStorage to log history data
 ---

 Key: YARN-953
 URL: https://issues.apache.org/jira/browse/YARN-953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-953.1.patch, YARN-953.2.patch, YARN-953.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-08-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729801#comment-13729801
 ] 

Zhijie Shen commented on YARN-978:
--

+1 LGTM. The patch should be clean for trunk.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729825#comment-13729825
 ] 

Hadoop QA commented on YARN-696:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596170/YARN-696.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1657//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1657//console

This message is automatically generated.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729833#comment-13729833
 ] 

Steve Loughran commented on YARN-1024:
--

I was the one trying to convince Sandy that a uniform core metric is dangerous, 
it's like when a MIP was a VAX-equivalent Million Instructions.

# different parts have different performance in terms of FPU and memory IO 
bandwidth, even if the integer perf is the same. (hence people like to get 
Intel parts over AMD parts on EC2 allocations). 
# there's also the hyperthreading issue; is an HT core the equivalent of a real 
core (no, but Linux treats them the same, AFAIK).
# over time, as 2007 gets further away, the metric becomes less relevant.
# EC2 also includes RAM (e.g m1.small has same CPU as m1.medium, only less RAM; 
AWS considers medium as having 2x ECUs. 

One thing I was arguing against in YARN-972 is allocating fractions of a real 
core: if I say 1 core, I get a single core, irrespective of performance. If 
EC2s are used, and I ask for 1 ECU, does that mean that I get 0.50 of a bigger 
core, or a free upgrade.

I'm happy if I ask for 8 ECUs and get a guarantee of not being on a CPU with 8 
ECUs, making it a minimum requirement of the CPU perf.


 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-08-05 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729832#comment-13729832
 ] 

Mayank Bansal commented on YARN-978:


+1

Thanks,
Mayank

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Xuan Gong
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729839#comment-13729839
 ] 

Sandy Ryza commented on YARN-1024:
--

If I am used to running my single-threaded task on a fast core (let's say rated 
at 250 YVCs), and then I migrate it to another cluster with slower cores (let's 
say rated at 150 YVCs), and still request 250 YVCs, my task will run no faster 
than if I had requested it with 150 YVCs.  I won't get predictable performance, 
and, from a scheduling perspective, I'd be better off requesting 150 YVCs on 
the slower cluster.

In a single pcore-to-vcore world, if I know that my task is CPU-bound and uses 
X threads, I know that each vcore I ask for up to X vcores will predictably 
improve its performance, whatever cluster I am running on.  In a world where 
different cores have different YVCs, I don't get a clear concept of when I 
should increase my YVCs requested, and the advantage of doing so depends mostly 
on the cluster I am running on.

A virtual core definition based on processing power masks the fact that two 1.5 
GHz cores mean something very different than three 1.0 GHz cores. And makes it 
very hard to reason about how many virtual cores to request.


 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1030) Adding AHS as service of RM

2013-08-05 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1030:
-

 Summary: Adding AHS as service of RM
 Key: YARN-1030
 URL: https://issues.apache.org/jira/browse/YARN-1030
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1031) JQuery UI components reference external css in branch-23

2013-08-05 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-1031:
-

 Summary: JQuery UI components reference external css in branch-23
 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-08-05 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729865#comment-13729865
 ] 

Jonathan Eagles commented on YARN-1031:
---

This issue in not present in branch-2 or trunk. Thinking that this will be just 
a minor bug instead of bringing back entire jquery themes into the source base 
in branch-0.23

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23

2013-08-05 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1031:
--

Attachment: YARN-1031-branch-0.23.patch

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-branch-0.23.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1018) prereq check for AMRMClient.ContainerRequest relaxLocality flag wrong

2013-08-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729890#comment-13729890
 ] 

Steve Loughran commented on YARN-1018:
--

OK, but there's a risk that an empty array has come from some feature (like the 
list of past containers), and that if the list is non-empty then that's because 
there were no past containers.

If it is rejected if the node list is empty, then you may end up coding
{code}
boolean strict = nodes.length! =0
new AMRMClient.ContainerRequest(capability, nodes, null, 0, !strict);
{code}

 prereq check for AMRMClient.ContainerRequest relaxLocality flag wrong
 -

 Key: YARN-1018
 URL: https://issues.apache.org/jira/browse/YARN-1018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Priority: Minor

 Trying to create a container request with no racks/nodes and no relaxed 
 priority fails
 {code}
 new AMRMClient.ContainerRequest(capability, null, null, 0, false);
 {code}
 expected: a container request.
 actual: stack trace saying I can't relax node locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729895#comment-13729895
 ] 

Jason Lowe commented on YARN-1024:
--

Agree that the example posed by [~sandyr] shows that a single unit in the 
request cannot properly convey the ask.  Chatted briefly about this offline 
with [~revans2] and [~nroberts] and we think in general there needs to be a way 
to show the parallelism needed along with some performance guarantee from those 
threads.  That basically leads us to a path where in the generalized case we're 
asking for a list of vcore units, where the number of entries in the list 
represents the desired hardware parallelism and the value of each entry 
represents the performance needed for that execution thread.

Using this with Sandy's example, asking for a single unit of 250 YVCs means it 
would not be allocated on the node with three cores each rated at 150 YVCs 
because none of the cores meets the single-threaded performance needed by the 
container.  If another job came along and asked for three cores each at 100 
YVCs, that could still run on a node that only has a single core rated at 500 
YVCs because that core likely has enough horsepower to multitask the three 
threads and get them each the required performance.

I understand where [~ste...@apache.org] is coming from re: dangers of 
developing one unit to rule them all, but I also think there needs to be 
*some* way to convey performance requirements.  Sandy's example shows that just 
because a job ran fine with one core on some box doesn't mean the job is going 
to run fine with one core on another.  We will not be able to develop a metric 
that will cover all the hardware architecture differences, but if a metric 
works in the vast majority of cases then I think that's a net win over no 
metric.

The APIs are already set for 2.1, and I believe the common case will be jobs 
where a single thread dominates the overall CPU request of the container.  In 
that sense, we can map the existing API call to a single vcore ask and add 
another API where the ask can be a list/array of vcore asks.  This could get 
complicated in the scheduler for an architecture where the effective vcore 
rating of the processors is not homogenous (brings up the spectre of 
processor-pinning and per-processor scheduling), but I don't think this will be 
a common architecture.

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-08-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729922#comment-13729922
 ] 

Jason Lowe commented on YARN-1031:
--

+1, lgtm.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-branch-0.23.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1032) NPE in RackResolve

2013-08-05 Thread Lohit Vijayarenu (JIRA)
Lohit Vijayarenu created YARN-1032:
--

 Summary: NPE in RackResolve
 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Priority: Minor


We found a case where our rack resolve script was not returning rack due to 
problem with resolving host address. This exception was see in 
RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 

{noformat}
2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING 
RM. 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
at 
org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
at java.lang.Thread.run(Thread.java:722)

{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-08-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729969#comment-13729969
 ] 

Aaron T. Myers commented on YARN-1029:
--

Just to be completely explicit, this is being presented as an alternative to 
using a separate ZKFC daemon?

FWIW, in HDFS we deliberately opted to not do this so that the ZKFC could be 
completely logically separate from the NN, and so that the ZKFC could one day 
be made to monitor garbage collections and potentially not trigger a failover 
if one of those were going on. We have yet to get to the latter.

 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 It should be possible to embed common ActiveStandyElector into the RM such 
 that ZooKeeper based leader election and notification is in-built. In 
 conjunction with a ZK state store, this configuration will be a simple 
 deployment option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-08-05 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729977#comment-13729977
 ] 

Lohit Vijayarenu commented on YARN-1032:


Once we hit exception in RackResolver, since this is not caught or default rack 
is not returned, this is end up not releasing containers which could not be 
assigned in RMContainerAllocator.java

{noformat}

  assignContainers(allocatedContainers);
   
  // release container if we could not assign it 
  it = allocatedContainers.iterator();
  while (it.hasNext()) {
Container allocated = it.next();
LOG.info(Releasing unassigned and invalid container  
+ allocated + . RM may have assignment issues);
containerNotAssigned(allocated);
  }
{noformat}

AM would no longer ask for new containers since it thinks containers are 
assigned and RM assumes containers are allocated to AM. Job ends up hanging 
forever without making any progress. Fixing releasing containers might be part 
of another JIRA, at the minimum we need to catch exception and return default 
rack incase of failure. 

 NPE in RackResolve
 --

 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Priority: Minor

 We found a case where our rack resolve script was not returning rack due to 
 problem with resolving host address. This exception was see in 
 RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
 {noformat}
 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
   at 
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1032) NPE in RackResolve

2013-08-05 Thread Lohit Vijayarenu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu updated YARN-1032:
---

Attachment: YARN-1032.1.patch

Simple patch to catch NPE and return default-rack. Since it is catch NPE did 
not try to come up with test case. Let me know if this look good.

 NPE in RackResolve
 --

 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Priority: Minor
 Attachments: YARN-1032.1.patch


 We found a case where our rack resolve script was not returning rack due to 
 problem with resolving host address. This exception was see in 
 RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
 {noformat}
 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
   at 
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-1031) JQuery UI components reference external css in branch-23

2013-08-05 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles resolved YARN-1031.
---

   Resolution: Fixed
Fix Version/s: 0.23.10

Thanks for the review, Jason.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Fix For: 0.23.10

 Attachments: YARN-1031-branch-0.23.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730074#comment-13730074
 ] 

Arun C Murthy commented on YARN-1024:
-

bq. If I am used to running my single-threaded task on a fast core (let's say 
rated at 250 YVCs), and then I migrate it to another cluster with slower cores 
(let's say rated at 150 YVCs), and still request 250 YVCs, my task will run no 
faster than if I had requested it with 150 YVCs.

[~sandyr] That is why you'd set a max-vcores in CS/FS of 150. This prevents 
users from falling into that trap. So, that should solve it - correct?

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730077#comment-13730077
 ] 

Arun C Murthy commented on YARN-1024:
-

[~jlowe] Yep, it does make sense to talk about a more explicit 'vector of 
cores' model as we've discussed in past - that said, I agree it's too early.

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730080#comment-13730080
 ] 

Arun C Murthy commented on YARN-1024:
-

Overall, yes, there are certainly issues with a strict definition vcore etc., 
but we need to do *just enough* for now - not solve all possible permutations.

Basic requirements are simplicity, predictability and consistency - in that 
order.


 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-08-05 Thread Trevor Lorimer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730095#comment-13730095
 ] 

Trevor Lorimer commented on YARN-696:
-

In this patch I changed state to states and enabled comma separated state 
queries.

However I could not find a way to create tests where I can be sure multiple 
applications with different states exist at a specific time. Are there any 
examples where an application can be created with a predefined static state?

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-08-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730153#comment-13730153
 ] 

Zhijie Shen commented on YARN-1032:
---

{code}
rNameList == null
{code}

Look into DNSToSwitchMapping doc, resolve() seems not to return null. Probably, 
you want to check
{code}
rNameList.size() == 0
{code}


Please add a test case in TestRackResolver.

 NPE in RackResolve
 --

 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Priority: Minor
 Attachments: YARN-1032.1.patch


 We found a case where our rack resolve script was not returning rack due to 
 problem with resolving host address. This exception was see in 
 RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
 {noformat}
 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
   at 
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-08-05 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730167#comment-13730167
 ] 

Lohit Vijayarenu commented on YARN-1032:


[~zjshen] Yes, documentation does not mention returning null for resolve(), but 
if you look into RawScriptBasedMapping::resolve(), failure to resolve rack can 
return null in atleast two places, hence the null check. Thanks for pointing 
out TestRackResolver, I will try to add a test case.

 NPE in RackResolve
 --

 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Priority: Minor
 Attachments: YARN-1032.1.patch


 We found a case where our rack resolve script was not returning rack due to 
 problem with resolving host address. This exception was see in 
 RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
 {noformat}
 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
   at 
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730195#comment-13730195
 ] 

Sandy Ryza commented on YARN-1024:
--

Jason, Steve, and Arun, you bring up good points that I think have helped me 
understand some of my assumptions.   I agree that simplicity, predictability, 
and consistency are our most important requirements.  I agree with Jason that 
at least two values -  processing power per core and # of cores - are required 
to fully express a request, and that, in spite of this, we should not use both 
and that a single value is better than nothing.

We have a tradeoff between
* A definition that offers some predictability between clusters, but only makes 
sense for requests for a single physical core or less per container.
* A definition that offers predictability only on homogeneous hardware, but 
that functions sensibly for requests for both more and less than a single 
physical core.

I thought that one of the exciting things about allowing requests for CPU would 
be that YARN would be able to better accommodate multi-threaded CPU-intensive 
frameworks like MPI and Storm.  Predictability between clusters seems to matter 
a lot less to me. A ton of other factors interfere with this kind of 
predictability.  The speed that hardware permits a task to read from disk or 
over the network has can have just as large an impact on the processing power 
it consumes as whatever the task is doing.  I don't believe that we will be 
able to attain predictability to the degree that it will provide much value.


 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730197#comment-13730197
 ] 

Sandy Ryza commented on YARN-1024:
--

bq. The speed that hardware permits a task to read from disk or over the 
network has can have just as large an impact on the processing power it 
consumes as whatever the task is doing.
Meant: The speed that hardware permits a task to read from disk or over the 
network can have just as large an impact on the processing power it consumes as 
whatever the task is doing.

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-08-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730209#comment-13730209
 ] 

Bikas Saha commented on YARN-1029:
--

Yes. Thats correct. I am aware of the HDFS discussions. ZKFC is definitely 
going to be part of RM failover and supported. Given RM's lower memory 
consumption and sane values of ZK timeouts the GC problem may not be severe in 
the RM's case. On the other hand, with RM state being also stored in ZK, having 
an embedded FC may considerably simplify deployment and maintenance of RM 
failover. So its not a bad option to have.

 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 It should be possible to embed common ActiveStandyElector into the RM such 
 that ZooKeeper based leader election and notification is in-built. In 
 conjunction with a ZK state store, this configuration will be a simple 
 deployment option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-08-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730224#comment-13730224
 ] 

Aaron T. Myers commented on YARN-1029:
--

Sounds good to me. I think we should seriously consider moving the ZKFC 
functionality into the NN as well, since in practice I don't think it's bought 
us much of anything and definitely complicates the deployment. But, that's 
another discussion for another day.

Thanks, Bikas.

 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 It should be possible to embed common ActiveStandyElector into the RM such 
 that ZooKeeper based leader election and notification is in-built. In 
 conjunction with a ZK state store, this configuration will be a simple 
 deployment option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730301#comment-13730301
 ] 

Eli Collins commented on YARN-1024:
---

I agree we need to define the meaning of a virtual core unambiguously 
(otherwise we won't be able to support two different frameworks on the same 
cluster that may have differing ideas of what a vcore is). I also agree with 
Phil that there are essentially two high-level use cases:

1. Jobs that want to express how much CPU capacity the job needs. Real world 
example - a distcp job wants to express it needs 100 containers but only a 
fraction of a CPU for each since it will spend most of its time blocking on IO.

2. Services - ie long-lived frameworks (ie support 2-level scheduling) - that 
want to request cores on many machines on a cluster and want to express 
CPU-level parallelism and aggregate demand (because they will schedule 
fine-grain requests w/in their long-lived containers). Eg a framework should be 
able to ask for two containers on a host, each with one core, so it can get two 
containers that can execute in parallel on a full core. This is assuming we 
plan to support long-running services in Yarn (YARN-896), which is hopefully 
not controversial. Real world example is HBase which may want 2 guaranteed 
cores per host on a given set of hosts.

Seems like there are two high-level approaches:

1. Get rid of vcores. If we define 1vcore=1pcore (1vcore=1vcpu for virtual 
environments) and support fractional cores (YARN-972) then services can ask for 
1 or more vcores knowing they're getting real cores and jobs just ask for what 
fraction of a vcore they think they need. This is really abandoning the concept 
of a virtual core because it's actually expressing a physical requirement 
(like memory, we assume Yarn is not dramatically over-committing the host). We 
can handle heterogeneous CPUs via attributes (as discussed in other Yarn jiras) 
since most clusters in my experience don't have wildly different processors (eg 
1 or 2 generations is common), and attributes are sufficient to express 
policies like all my cores should have equal/comparable performance.

2. Keep going with vcores as a CPU unit of measurement. If we define 
1vcore=1ECU (works 1:1 for virtual environments) then services (#1) need to 
understand the the power of a core so they can ask for that many vcores - 
essentially they are just undoing the virtualization. YARN would need to make 
sure two containers each with 1 pcores worth of vcores does in fact give you 
two cores( just like hypervisors schedule vcpus for the same VM on different 
pcores to ensure parallelism), but there would be no guarantee that two 
containers on the same host each w/ one vcore would run in parallel. Jobs that 
want fractional cores would just express 1vcore per container and work they're 
way up based on the experience running on the cluster (or also undo the 
virtualization by calculating vcore/pcore if they know what fraction of a pcore 
they want). Heterogenous CPUs does not fall out naturally (still need 
attributes) since there's no guarantee you can describe the difference between 
two CPUs is roughly 1 or more vcore (eg 2.4 - vs 2.0 Ghz  1ECU), however 
there's no need for fractional vcores.

I think either is reasonable and can be made to work, though I think #1 is 
preferable because:
- Some frameworks want to express containers in physical resources (this is 
consistent with how YARN handles memory)
- You can support jobs that don't want a full core via fractional cores (or 
slightly over-committing cores)
- You can support heterogeneous cores via attributes (I want equivalent 
containers)
- vcores are optional anyway (only used in DRF) and therefore only need to be 
expressed if you care about physical cores because you need to reserve them or 
say you want a fraction of one

Either way I think vcore is the wrong name either way because in #1 
1vcore=1pcore so there's no virtualization and in #2 1 vcore is not a 
virtualization of a core (10 vcores does not give me 10 levels of parallelism), 
it's _just a unit_ (like an ECU).

 Define a virtual core unambigiously
 ---

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 

[jira] [Comment Edited] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730301#comment-13730301
 ] 

Eli Collins edited comment on YARN-1024 at 8/6/13 3:09 AM:
---

I agree we need to define the meaning of a virtual core unambiguously 
(otherwise we won't be able to support two different frameworks on the same 
cluster that may have differing ideas of what a vcore is). I also agree with 
Phil that there are essentially two high-level use cases:

1. Jobs that want to express how much CPU capacity the job needs. Real world 
example - a distcp job wants to express it needs 100 containers but only a 
fraction of a CPU for each since it will spend most of its time blocking on IO.

2. Services - ie long-lived frameworks (ie support 2-level scheduling) - that 
want to request cores on many machines on a cluster and want to express 
CPU-level parallelism and aggregate demand (because they will schedule 
fine-grain requests w/in their long-lived containers). Eg a framework should be 
able to ask for two containers on a host, each with one core, so it can get two 
containers that can execute in parallel on a full core. This is assuming we 
plan to support long-running services in Yarn (YARN-896), which is hopefully 
not controversial. Real world example is HBase which may want 2 guaranteed 
cores per host on a given set of hosts.

Seems like there are two high-level approaches:

1. Get rid of vcores. If we define 1vcore=1pcore (1vcore=1vcpu for virtual 
environments) and support fractional cores (YARN-972) then services can ask for 
1 or more vcores knowing they're getting real cores and jobs just ask for what 
fraction of a vcore they think they need. This is really abandoning the concept 
of a virtual core because it's actually expressing a physical requirement 
(like memory, we assume Yarn is not dramatically over-committing the host). We 
can handle heterogeneous CPUs via attributes (as discussed in other Yarn jiras) 
since most clusters in my experience don't have wildly different processors (eg 
1 or 2 generations is common), and attributes are sufficient to express 
policies like all my cores should have equal/comparable performance.

2. Keep going with vcores as a CPU unit of measurement. If we define 
1vcore=1ECU (works 1:1 for virtual environments) then services need to 
understand the the power of a core so they can ask for that many vcores - 
essentially they are just undoing the virtualization. YARN would need to make 
sure two containers each with 1 pcores worth of vcores does in fact give you 
two cores (just like hypervisors schedule vcpus for the same VM on different 
pcores to ensure parallelism), but there would be no guarantee that two 
containers on the same host each w/ one vcore would run in parallel. Jobs that 
want fractional cores would just express 1vcore per container and work their 
way up based on the experience running on the cluster (or also undo the 
virtualization by calculating vcore/pcore if they know what fraction of a pcore 
they want). Heterogenous CPUs does not fall out naturally (still need 
attributes) since there's no guarantee you can describe the difference between 
two CPUs is roughly 1 or more vcore (eg 2.4 - vs 2.0 Ghz  1ECU), however 
there's no need for fractional vcores.

I think either is reasonable and can be made to work, though I think #1 is 
preferable because:
- Some frameworks want to express containers in physical resources (this is 
consistent with how YARN handles memory)
- You can support jobs that don't want a full core via fractional cores (or 
slightly over-committing cores)
- You can support heterogeneous cores via attributes (I want equivalent 
containers)
- vcores are optional anyway (only used in DRF) and therefore only need to be 
expressed if you care about physical cores because you need to reserve them or 
say you want a fraction of one

Either way I think vcore is the wrong name because in #1 1vcore=1pcore so 
there's no virtualization and in #2 1 vcore is not a virtualization of a core 
(10 vcores does not give me 10 levels of parallelism), it's _just a unit_ (like 
an ECU).

  was (Author: eli):
I agree we need to define the meaning of a virtual core unambiguously 
(otherwise we won't be able to support two different frameworks on the same 
cluster that may have differing ideas of what a vcore is). I also agree with 
Phil that there are essentially two high-level use cases:

1. Jobs that want to express how much CPU capacity the job needs. Real world 
example - a distcp job wants to express it needs 100 containers but only a 
fraction of a CPU for each since it will spend most of its time blocking on IO.

2. Services - ie long-lived frameworks (ie support 2-level scheduling) - that 
want to request cores on many machines on a cluster and want to express 
CPU-level parallelism and aggregate 

[jira] [Comment Edited] (YARN-1024) Define a virtual core unambigiously

2013-08-05 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730301#comment-13730301
 ] 

Eli Collins edited comment on YARN-1024 at 8/6/13 3:08 AM:
---

I agree we need to define the meaning of a virtual core unambiguously 
(otherwise we won't be able to support two different frameworks on the same 
cluster that may have differing ideas of what a vcore is). I also agree with 
Phil that there are essentially two high-level use cases:

1. Jobs that want to express how much CPU capacity the job needs. Real world 
example - a distcp job wants to express it needs 100 containers but only a 
fraction of a CPU for each since it will spend most of its time blocking on IO.

2. Services - ie long-lived frameworks (ie support 2-level scheduling) - that 
want to request cores on many machines on a cluster and want to express 
CPU-level parallelism and aggregate demand (because they will schedule 
fine-grain requests w/in their long-lived containers). Eg a framework should be 
able to ask for two containers on a host, each with one core, so it can get two 
containers that can execute in parallel on a full core. This is assuming we 
plan to support long-running services in Yarn (YARN-896), which is hopefully 
not controversial. Real world example is HBase which may want 2 guaranteed 
cores per host on a given set of hosts.

Seems like there are two high-level approaches:

1. Get rid of vcores. If we define 1vcore=1pcore (1vcore=1vcpu for virtual 
environments) and support fractional cores (YARN-972) then services can ask for 
1 or more vcores knowing they're getting real cores and jobs just ask for what 
fraction of a vcore they think they need. This is really abandoning the concept 
of a virtual core because it's actually expressing a physical requirement 
(like memory, we assume Yarn is not dramatically over-committing the host). We 
can handle heterogeneous CPUs via attributes (as discussed in other Yarn jiras) 
since most clusters in my experience don't have wildly different processors (eg 
1 or 2 generations is common), and attributes are sufficient to express 
policies like all my cores should have equal/comparable performance.

2. Keep going with vcores as a CPU unit of measurement. If we define 
1vcore=1ECU (works 1:1 for virtual environments) then services need to 
understand the the power of a core so they can ask for that many vcores - 
essentially they are just undoing the virtualization. YARN would need to make 
sure two containers each with 1 pcores worth of vcores does in fact give you 
two cores (just like hypervisors schedule vcpus for the same VM on different 
pcores to ensure parallelism), but there would be no guarantee that two 
containers on the same host each w/ one vcore would run in parallel. Jobs that 
want fractional cores would just express 1vcore per container and work their 
way up based on the experience running on the cluster (or also undo the 
virtualization by calculating vcore/pcore if they know what fraction of a pcore 
they want). Heterogenous CPUs does not fall out naturally (still need 
attributes) since there's no guarantee you can describe the difference between 
two CPUs is roughly 1 or more vcore (eg 2.4 - vs 2.0 Ghz  1ECU), however 
there's no need for fractional vcores.

I think either is reasonable and can be made to work, though I think #1 is 
preferable because:
- Some frameworks want to express containers in physical resources (this is 
consistent with how YARN handles memory)
- You can support jobs that don't want a full core via fractional cores (or 
slightly over-committing cores)
- You can support heterogeneous cores via attributes (I want equivalent 
containers)
- vcores are optional anyway (only used in DRF) and therefore only need to be 
expressed if you care about physical cores because you need to reserve them or 
say you want a fraction of one

Either way I think vcore is the wrong name either way because in #1 
1vcore=1pcore so there's no virtualization and in #2 1 vcore is not a 
virtualization of a core (10 vcores does not give me 10 levels of parallelism), 
it's _just a unit_ (like an ECU).

  was (Author: eli):
I agree we need to define the meaning of a virtual core unambiguously 
(otherwise we won't be able to support two different frameworks on the same 
cluster that may have differing ideas of what a vcore is). I also agree with 
Phil that there are essentially two high-level use cases:

1. Jobs that want to express how much CPU capacity the job needs. Real world 
example - a distcp job wants to express it needs 100 containers but only a 
fraction of a CPU for each since it will spend most of its time blocking on IO.

2. Services - ie long-lived frameworks (ie support 2-level scheduling) - that 
want to request cores on many machines on a cluster and want to express 
CPU-level parallelism and 

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-08-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730316#comment-13730316
 ] 

Bikas Saha commented on YARN-1021:
--

The idea and goals are very interesting. It would be great if there was a 
design description to initiate a discussion.

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan

 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira