[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658961#comment-13658961 ]
Steve Loughran commented on YARN-530: ------------------------------------- h3. Service bq. {{start()}} doesn't use {{enterState()}} API, so we don't capture the life-cycle change events. -fixed bq. {{init()}}, {{start()}} and {{stop()}} aren't synchronized, so callers of {{getServiceState()}} will get incorrect information if, let's say, {{innerInit}} is still in progress. This is interesting. I've realised they need to be sync or you add an "stateChangeInProgress" marker to ensure you eliminate the race of someone trying to {{stop()}} a service half-way through {{start()}}. Some of the more complex state models make the 'starting', 'stopping' states explicit, which is another option. * I'm going to make the methods {{synchronized}} to stop the risk of any race conditions -while doing a best effort at keeping the notifications outside the {{synchronized}} block. The corner case here is that when an init or start fails and {{stop()}} is called automatically: it will notify its listeners inside the {{stop()}} call. * I'm going to keep the state queries unsynced (reading a volatile), so that doesn't stop things that only want to read and not manipulate service state from blocking. bq. Rename {{inState}} to {{isInState}}? done bq. {{waitForServiceToStop}} seems redundant because we also have listeners, right? Sure one is a blocking call while the other is async. I'd remove it unless there is some other strong reason. May be we can implement an async utility using{{ getServiceState()}} and implement a generic {{waitForServiceState}} instead of just for stop? -let me add something that isn't on the critical path for the next alpha, as it isn't an API change: an entry point to start a service by its name. I've just added YARN-679 to show the use case here: an entry point to start any service by its classname. That isn't ready to be committed (where are the tests!), but it shows the vision. I'll see if I can implement it with the notifications. bq. What is the use of 'blockers'? An attempt to make the reason a service blocks visible, at least when it is consciously blocking (e.g. spin/sleep waiting for manager node). Unconscious blocking, by way of blocking API calls, will still happen. If the service can declare that they are blocked by something, then other tooling can say "what is this service waiting for" bq. May be LifecycleEvent move to top-level? -done bq. Not part of your patch, but we may as well take this opportunity to fix this: Rename register -> registerServiceListener? Similarly unregister. -done h3. AbstractService bq. The inner* method-names don't look good when using the service stuff. Shall we rename it to to say, for e.g, innerInit to initService ? bq. Mark all the super init, start, stop methods as final? Gladly, though it does cause a mockito test to fail: {code} testResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) Time elapsed: 247 sec <<< FAILURE! java.lang.AssertionError: null state in null class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker$$EnhancerByMockitoWithCGLIB$$221b8e7 at org.apache.hadoop.yarn.service.AbstractService.enterState(AbstractService.java:431) at org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:151) at org.apache.hadoop.yarn.service.CompositeService.serviceInit(CompositeService.java:67) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceInit(ResourceLocalizationService.java:240) at org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:154) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testResourceRelease(TestResourceLocalizationService.java:239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) {code} I don't see an obvious fix for this -need some help from a Mockito expert. bq. Lets err out on null configuration in init(). -done bq. enterState(): Rename original to either oldState or previousState? -done bq. stopQuietly, stopService wrappers are unnecessary? -purged h3. {{ServiceStateModel}} bq. : init to init, start to start transitions should also be allowed - for reentrancy? -I left out on the basis that if you are doing this then something may be wrong, but calling stop without any state checks is absolutely critical to handle. If you think we should me make the others no-op too, it'll be trivial. bq. {{ServiceStateModel}}: if {{Service.inState()}} is renamed, so should {{inState}} in this class too. -done bq. Rename 'original' to {{oldState}} or {{previousState}} and state to {currentState}? done for {{original}}, the other is the name of the field -do you think I should rename that? h3. {{ServiceOperarations}} bq. Why the extra wrappers for init, start - those APIs already internally check if the transition is valid or not right? Even so, stop doesn't have this extra transition-check while others have. -removed, were originally there to handle the (more brittle) previous model. bq. Little confused here: the tests indicate that on listener throwing exception doesn't disturb any other listener or the service itself, but I can't understand this from the code - notifyListeners doesn't do anything for listeners throwing exceptions. That's left to whatever invokes the notification, here {{AbstractService}}, which downgrades failures to warns. The tests verify that this happens. bq. {{TestServiceOperations.testStopUnstarted}}: You have duplicate checks in there. -pulled the entire test along with the operations tested bq. {{TestCompositeService.testServiceLifecycleNoChildrenl}}: Typo on test-name -fixed bq. Also, can you please fix the findbugs warnings? Tx I'll see what the report says on the next run > Define Service model strictly, implement AbstractService for robust > subclassing, migrate yarn-common services > ------------------------------------------------------------------------------------------------------------- > > Key: YARN-530 > URL: https://issues.apache.org/jira/browse/YARN-530 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: YARN-117changes.pdf, YARN-530-005.patch, > YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch > > > # Extend the YARN {{Service}} interface as discussed in YARN-117 > # Implement the changes in {{AbstractService}} and {{FilterService}}. > # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira