[jira] [Commented] (YARN-117) Enhance YARN service model

Hadoop QA (JIRA) Wed, 05 Jun 2013 17:31:27 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676538#comment-13676538
 ]


Hadoop QA commented on YARN-117:
--------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12586423/YARN-117-020.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 26 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

                  org.apache.hadoop.yarn.client.TestNMClientAsync

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1132//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1132//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1132//console

This message is automatically generated.
                
> Enhance YARN service model
> --------------------------
>
>                 Key: YARN-117
>                 URL: https://issues.apache.org/jira/browse/YARN-117
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: YARN-117-007.patch, YARN-117-008.patch, 
> YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
> YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
> YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
> YARN-117-019.patch, YARN-117-020.patch, YARN-117-2.patch, YARN-117-3.patch, 
> YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> class -yet subclasses tend to call this *after* their own {{init()}}, 
> {{start()}} & {{stop()}} operations. This means that these operations can be 
> performed out of order, and even if the outcome of the call is an exception, 
> all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
> demonstrates this.
> This is a tricky one to address. In HADOOP-3128 I used a base class instead 
> of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods 
> {{final}}. These methods would do the checks, and then invoke protected inner 
> methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
> retrofit the same behaviour to everything that extends {{AbstractService}} 
> -something that must be done before the class is considered stable (because 
> once the lifecycle methods are declared final, all subclasses that are out of 
> the source tree will need fixing by the respective developers.
> h2. AbstractService state change doesn't defend against race conditions.
> There's no concurrency locks on the state transitions. Whatever fix for wrong 
> state calls is added should correct this to prevent re-entrancy, such as 
> {{stop()}} being called from two threads.
> h2.  Static methods to choreograph of lifecycle operations
> Helper methods to move things through lifecycles. init->start is common, 
> stop-if-service!=null another. Some static methods can execute these, and 
> even call {{stop()}} if {{init()}} raises an exception. These could go into a 
> class {{ServiceOps}} in the same package. These can be used by those services 
> that wrap other services, and help manage more robust shutdowns.
> h2. state transition failures are something that registered service listeners 
> may wish to be informed of.
> When a state transition fails a {{RuntimeException}} can be thrown -and the 
> service listeners are not informed as the notification point isn't reached. 
> They may wish to know this, especially for management and diagnostics.
> *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
> {{stateChangeFailed(Service service,Service.State targeted-state, 
> RuntimeException e)}} that is invoked from the (final) state change methods 
> in the {{AbstractService}} class (once they delegate to their inner 
> {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
> implementations of the interface.
> h2. Service listener failures not handled
> Is this an error an error or not? Log and ignore may not be what is desired.
> *Proposed:* during {{stop()}} any exception by a listener is caught and 
> discarded, to increase the likelihood of a better shutdown, but do not add 
> try-catch clauses to the other state changes.
> h2. Support static listeners for all AbstractServices
> Add support to {{AbstractService}} that allow callers to register listeners 
> for all instances. The existing listener interface could be used. This allows 
> management tools to hook into the events.
> The static listeners would be invoked for all state changes except creation 
> (base class shouldn't be handing out references to itself at this point).
> These static events could all be async, pushed through a shared 
> {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the 
> listeners invoked.
> h2. Add some example listeners for management/diagnostics
> * event to commons log for humans.
> * events for machines hooked up to the JSON logger.
> * for testing: something that be told to fail.
> h2.  Services should support signal interruptibility
> The services would benefit from a way of shutting them down on a kill signal; 
> this can be done via a runtime hook. It should not be automatic though, as 
> composite services will get into a very complex state during shutdown. Better 
> to provide a hook that lets you register/unregister services to terminate, 
> and have the relevant {{main()}} entry points tell their root services to 
> register themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-117) Enhance YARN service model

Reply via email to