[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-117: - Attachment: YARN-117-025.patch > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, > YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, > YARN-117-025.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} m
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-117: - Attachment: YARN-117-024.patch Uber patch suppressing findBugs warnings. All the warnings are about fields accessed in the service* methods, which are not synchronized on the objects but should be fine as they are just read and not modified in any of the cases. > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, > YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, YARN-117-2.patch, > YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, > YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-117: - Attachment: YARN-117-023.patch Updated patch. - Drops common changes. They should be tracked separately. Didn't even review them, are they needed for the service stuff? - Drops spurious java comment changes to LocalCacheDirectoryManager.java and TestLocalCacheDirectoryManager.java - Minor improvement in TestNMWebServer.java - And including latest patch at YARN-530. > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, > YARN-117-022.patch, YARN-117-023.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-022.patch rebased to this trunk of June 10; all tests passing > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, > YARN-117-022.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a n
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-021.patch Patch incorporating the sync changes of the '530-021 patch. > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, YARN-117-2.patch, > YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, > YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-020.patch as with YARN-530-020 There's no change from -19 except an extra line of logging in CompositeService > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-020.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; ma
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-019.patch # make sure addService, removeService and service iteration are all thread safe -eliminates risk of concurrency problems if the service list is changed during init/start/stop of children. As these methods are protected, a subclass would have to do this in another thread -or make the methods public # composite service policy is switched to stop all services that are STARTED or INITED > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such a
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: (was: MAPREDUCE-5298-016.patch) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, > YARN-117-019.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-018.patch patch applying vinod's changes to YARN-530 and migration to the YARN-635 exception rename changes; in sync w/ YARN-530-018.patch > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: MAPREDUCE-5298-016.patch, YARN-117-007.patch, > YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, > YARN-117-011.patch, YARN-117-012.patch, YARN-117-013.patch, > YARN-117-014.patch, YARN-117-015.patch, YARN-117-016.patch, > YARN-117-018.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop(
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: MAPREDUCE-5298-016.patch Patch in sync w/ YARN-530-016.patch > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: MAPREDUCE-5298-016.patch, YARN-117-007.patch, > YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, > YARN-117-011.patch, YARN-117-012.patch, YARN-117-013.patch, > YARN-117-014.patch, YARN-117-015.patch, YARN-117-016.patch, YARN-117-2.patch, > YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, > YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failur
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-016.patch This appears to be Mockito changing things, marking them as final stopped that. Made two fields non-final and tightened service stop to handle them not being null > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-016.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; m
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-015.patch # rebase against trunk # move {{NMClientAsync}} and {{NMClientImpl}} to the new service lifecycle -making their {{serviceStop()}} operation robust in the process. # fix {{TestNMClient}} to not expect {{stop()}} to stop the containers when invoked for a second time. This is done by pulling the cleanup validation out of {{teardown()}} and embedding into two different tests -one for each cleanup on shutdown option > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, > YARN-117-015.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-014.patch Patch in-sync/including YARN-530-014.patch. Pulled the {{ServiceShutdownHook}} code off to YARN-679; this eliminates two of the FindBugs warnings. The remaining two are spurious as they note that an AtomicBool is being used for both a wait/notify barrier as well as for the thread-safe get/set operations. > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, YARN-117-2.patch, > YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, > YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{Ab
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-013.patch in sync w/ YARN-530-013. # move the service listener registration method rename to YARN-746, which will have its patch created after this is checked in. # added a null pointer check in {{TestNodeStatusUpdater.verifyNodeStartFailure()}}. There's an NPE surfacing in test runs where YARN-530 has been applied, but nothing else in this aggregate patch. When its applied the NPE goes away so the extra {{assertNotNull()}} doesn't actually catch it. I stuck it in just in case it does surface again with any future code change. > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChange
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-012.patch aggregate patch in sync with YARN-530-012.patch > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-011.patch patch in sync w/ YARN-530-011; as well as init/start being no-ops on reentrant calls, there are more detailed asserts in the TestNMClient test (which is not failing locally) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, YARN-117-2.patch, > YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, > YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener fail
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-010.patch aggregate patch w/ YARN-530-010.patch; same patch as 009, rebased > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any e
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-009.patch Patch in sync with the YARN-530-009.patch. change since last submission #{{TestStagingCleanup}} has its services stopped # {{YarnClientImpl}} has gained a constructor that allows subclasses to pass their name up as the service name -so giving service messages more meaning > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > imp
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-008.patch build diff from root of repository, so patch can apply it > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, > YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarde
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-007.patch patch in sync with with {{YARN-530-005.patch}} #adapts to the new {{serviceStart()}}, {{serviceStop()}}, serviceInit()}} names. # {{NodeManager}} shutdown is hardened to work from INITED # {{NodeStatusUpdater}} cross-thread stop flag marked as {{volatile}} # Various tests more rigorous about stopping services on failure/exit > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the ex
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.6.patch same source, patch created with a {{git diff trunk..HEAD}} in the hope this creates a patch that applies > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > disca
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.5.patch > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.5.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarded, to increase the likelihood of a better shutdown, but do not add > try-catch clauses to the other state changes. > h2. Support static listeners for all AbstractServic
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.4.patch The changes here since the last patch related to the test {{TestNodeStatusUpdater}} which was failing on Jenkins but not locally. #adding timeouts in the {{syncBarrier.await()}} clause handle better the situation where the rollback of a failing {[start()}} doesn't block -as the barrier in the test case isn't reached as it would be on the same thread. #lots of extra assertions and debugging to see why {{testNMConnectionToRM()}} fails most of the time on a Linux test VM. It looks like the time-based assertions are brittle there (not fixed) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, > YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{Ab
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-3.patch Updated patch # fix javadoc warnings (enum @links that the IDE felt were valid, but not javadoc) # added more rigorous shutdown in the test teardown, and make sure the overridden {{NodeManager}} overrides {{innerStop()}} and not {{stop()}} > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any ex
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-2.patch This is an aggregate patch for jenkins and testing; YARN-530 is the subset for real review * contains a code-review of all {{innerInit()}}, {{innerStart()}} ops * contain a code review of all {{innerStop()}} operations to harden against null values (MAPREDUCE-3502). * incorporates YARN-535 to fix a race condition in one test > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117-2.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and igno
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.patch This is the across-all-yarn-projects patch (plus HADOOP-9447) just to show what the combined patch looks and tests like. YARN-530 contains the changes to yarn-common which should be the first step. (This patch contains those) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarded, t
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-117: - Priority: Major (was: Blocker) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarded, to increase the likelihood of a better shutdown, but do not add > try-catch clauses to the other state changes. > h2. Support static listeners for all AbstractServices > Add support to {{AbstractService}} that allow callers to register listeners > for all in
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-117: - Priority: Blocker (was: Major) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarded, to increase the likelihood of a better shutdown, but do not add > try-catch clauses to the other state changes. > h2. Support static listeners for all AbstractServices > Add support to {{AbstractService}} that allow callers to register listeners > for