date:20130414

[jira] [Updated] (YARN-117) Enhance YARN service model

2013-04-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117.5.patch

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this an error an error or not? Log and ignore may not be what is desired.
 *Proposed:* during {{stop()}} any exception by a listener is caught and 
 discarded, to increase the likelihood of a better shutdown, but do not add 
 try-catch clauses to the other state changes.
 h2. Support static listeners for all AbstractServices
 Add support to {{AbstractService}} that allow callers to register listeners

[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631364#comment-13631364
 ] 

Steve Loughran commented on YARN-117:
-

updated patch where {{TestNodeStatusUpdater.testNMConnectionToRM()}} should not 
fail irrespective of how long it takes for the NM's {{init()}} process to take. 
Until now the custom {{NodeStatusUpdater}}  set its clock in the constructor, 
which was called during the NM's {{init()}}, but the test only set it's clock 
after {{init()}} and before {{start()}}. As a result, if the init took too 
long, the test would fail saying the RM took too long, when the delay logic 
was actually working.

The fix is for the custom {{NodeStatusUpdater}} to set its {{waitStartTime}} in 
it's {{innerStart()}}, so only when the service is started. This has a narrower 
gap between the tests's measured start time and the updater -though the time to 
start a couple of services before the updater is started could still be 
troublesome.

Even so, retaining the time measurement logic in the test (rather than just 
probing the Updater to verify it triggered) is essential to be confident that 
{{NodeManager().start()}} doesn't complete until the connection has been 
established

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners

[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631369#comment-13631369
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578645/YARN-117.5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/733//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.

[jira] [Updated] (YARN-471) NM does not validate the resource capabilities before it registers with RM

2013-04-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-471:
-

Attachment: YARN-471.1.patch

 NM does not validate the resource capabilities before it registers with RM
 --

 Key: YARN-471
 URL: https://issues.apache.org/jira/browse/YARN-471
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
  Labels: usability
 Attachments: YARN-471.1.patch


 Today, an NM could register with -1 memory and -1 cpu with the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-471) NM does not validate the resource capabilities before it registers with RM

2013-04-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned YARN-471:


Assignee: Hitesh Shah

 NM does not validate the resource capabilities before it registers with RM
 --

 Key: YARN-471
 URL: https://issues.apache.org/jira/browse/YARN-471
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Hitesh Shah
  Labels: usability
 Attachments: YARN-471.1.patch


 Today, an NM could register with -1 memory and -1 cpu with the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-04-14 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631386#comment-13631386
]

Hitesh Shah commented on YARN-541:
--

@Krishna, could you provide more information:
- What scheduler are you using?
- Could you attach the application logs as well as the RM's logs.

getAllocatedContainers() is not returning all the allocated containers
--

Key: YARN-541
URL: https://issues.apache.org/jira/browse/YARN-541
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.0.3-alpha
Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri

I am running an application that was written and working well with the
hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the
getAllocatedContainers() method called on AMResponse is not returning all the
containers allocated sometimes. For example, I request for 10 containers and
this method gives me only 9 containers sometimes, and when I looked at the
log of Resource Manager, the 10th container is also allocated. It happens
only sometimes randomly and works fine all other times. If I send one more
request for the remaining container to RM after it failed to give them the
first time(and before releasing already acquired ones), it could allocate
that container. I am running only one application at a time, but 1000s of
them one after another.
My main worry is, even though the RM's log is saying that all 10 requested
containers are allocated, the getAllocatedContainers() method is not
returning me all of them, it returned only 9 surprisingly. I never saw this
kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
Thanks,
Kishore

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-523) Container localization failures aren't reported from NM to RM

2013-04-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-523:
-

Assignee: Omkar Vinit Joshi

 Container localization failures aren't reported from NM to RM
 -

 Key: YARN-523
 URL: https://issues.apache.org/jira/browse/YARN-523
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi

 This is mainly a pain on crashing AMs, but once we fix this, containers also 
 can benefit - same fix for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-115) yarn commands shouldn't add m to the heapsize

2013-04-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-115:
-

Labels: usability  (was: )

 yarn commands shouldn't add m to the heapsize
 ---

 Key: YARN-115
 URL: https://issues.apache.org/jira/browse/YARN-115
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Thomas Graves
  Labels: usability

 the yarn commands add m to the heapsize. This is unlike the hdfs side and 
 the the old jt/tt used to do.
 JAVA_HEAP_MAX=-Xmx$YARN_RESOURCEMANAGER_HEAPSIZEm
 JAVA_HEAP_MAX=-Xmx$YARN_NODEMANAGER_HEAPSIZEm
 We should not be adding in the m and allow the user to specify units.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-358) bundle container classpath in temporary jar on all platforms, not just Windows

2013-04-14 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631390#comment-13631390
 ] 

Hitesh Shah commented on YARN-358:
--

@Chris, are we just talking about the command line or does this affect 
environment variables too? Given that YARN can launch any kind of application ( 
C++/Java/script ), what are the areas of concern that need to be addressed for 
containers to launch correctly on windows? 

Should this be a YARN feature or is it better to hand this off to the 
application logic to handle correct launching of a container on a particular OS 
type? 

 bundle container classpath in temporary jar on all platforms, not just Windows
 --

 Key: YARN-358
 URL: https://issues.apache.org/jira/browse/YARN-358
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: trunk-win
Reporter: Chris Nauroth

 Currently, a Windows-specific code path bundles the classpath into a 
 temporary jar with a manifest to work around command line length limitations. 
  This code path does not need to be Windows-specific.  We can use the same 
 approach on all platforms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-471) NM does not validate the resource capabilities before it registers with RM

2013-04-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631392#comment-13631392
 ] 

Hadoop QA commented on YARN-471:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578646/YARN-471.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/734//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/734//console

This message is automatically generated.

 NM does not validate the resource capabilities before it registers with RM
 --

 Key: YARN-471
 URL: https://issues.apache.org/jira/browse/YARN-471
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Hitesh Shah
  Labels: usability
 Attachments: YARN-471.1.patch


 Today, an NM could register with -1 memory and -1 cpu with the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-386) [Umbrella] YARN API Changes

2013-04-14 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-386:


Summary: [Umbrella] YARN API Changes  (was: [Umbrella] YARN API cleanup)

 [Umbrella] YARN API Changes
 ---

 Key: YARN-386
 URL: https://issues.apache.org/jira/browse/YARN-386
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli

 This is the umbrella ticket to capture any and every API cleanup that we wish 
 to do before YARN can be deemed beta/stable. Doing this API cleanup now and 
 ASAP will help us escape the pain of supporting bad APIs in beta/stable 
 releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-445) Ability to signal containers

2013-04-14 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631399#comment-13631399
]

Bikas Saha commented on YARN-445:
-

Sounds like an enhancement in the NM API. Moving under YARN-386. Please unlink
if that is not correct.

I can see the usecase this seeks to solve. I am wondering what is the
abstraction in the general case. That would help us to not change stuff for
every similar use case. Keeping platform neutrality would be beneficial so that
the usecases continue to work for non Java AM/tasks or on Windows.

Ability to signal containers

Key: YARN-445
URL: https://issues.apache.org/jira/browse/YARN-445
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager
Affects Versions: 2.0.5-beta
Reporter: Jason Lowe

It would be nice if an ApplicationMaster could send signals to contaniers
such as SIGQUIT, SIGUSR1, etc.
For example, in order to replicate the jstack-on-task-timeout feature
implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an
interface for sending SIGQUIT to a container. For that specific feature we
could implement it as an additional field in the StopContainerRequest.
However that would not address other potential features like the ability for
an AM to trigger jstacks on arbitrary tasks *without* killing them. The
latter feature would be a very useful debugging tool for users who do not
have shell access to the nodes.

[jira] [Updated] (YARN-445) Ability to signal containers

2013-04-14 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-445:


Issue Type: Sub-task  (was: New Feature)
Parent: YARN-386

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.5-beta
Reporter: Jason Lowe

 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-14 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631400#comment-13631400
]

Bikas Saha commented on YARN-45:

My personal preference would be to not have an API that is not actionable. If
the RM is not having any support for ResourceRequest scenarios then we can
leave that out for later when such support does arise. Having something out
there that does not work may lead to misunderstanding and confusion on the part
of YARN app developers.

Scheduler feedback to AM to release containers
--

Key: YARN-45
URL: https://issues.apache.org/jira/browse/YARN-45
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
Attachments: YARN-45.patch, YARN-45.patch

The ResourceManager strikes a balance between cluster utilization and strict
enforcement of resource invariants in the cluster. Individual allocations of
containers must be reclaimed- or reserved- to restore the global invariants
when cluster load shifts. In some cases, the ApplicationMaster can respond to
fluctuations in resource availability without losing the work already
completed by that task (MAPREDUCE-4584). Supplying it with this information
would be helpful for overall cluster utilization [1]. To this end, we want to
establish a protocol for the RM to ask the AM to release containers.
[1] http://research.yahoo.com/files/yl-2012-003.pdf

[jira] [Updated] (YARN-117) Enhance YARN service model

[jira] [Commented] (YARN-117) Enhance YARN service model

[jira] [Commented] (YARN-117) Enhance YARN service model

[jira] [Updated] (YARN-471) NM does not validate the resource capabilities before it registers with RM

[jira] [Assigned] (YARN-471) NM does not validate the resource capabilities before it registers with RM

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

[jira] [Updated] (YARN-523) Container localization failures aren't reported from NM to RM

[jira] [Updated] (YARN-115) yarn commands shouldn't add m to the heapsize

[jira] [Commented] (YARN-358) bundle container classpath in temporary jar on all platforms, not just Windows

[jira] [Commented] (YARN-471) NM does not validate the resource capabilities before it registers with RM

[jira] [Updated] (YARN-386) [Umbrella] YARN API Changes

[jira] [Commented] (YARN-445) Ability to signal containers

[jira] [Updated] (YARN-445) Ability to signal containers

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

14 matches

Site Navigation

Mail list logo

Footer information