[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658961#comment-13658961
 ] 

Steve Loughran commented on YARN-530:
-------------------------------------

h3. Service

bq. {{start()}} doesn't use {{enterState()}} API, so we don't capture the 
life-cycle change events.

-fixed

bq. {{init()}}, {{start()}} and {{stop()}} aren't synchronized, so callers of 
{{getServiceState()}} will get incorrect information if, let's say, 
{{innerInit}} is still in progress.

This is interesting. I've realised they need to be sync or you add an 
"stateChangeInProgress" marker to ensure you eliminate the race of someone 
trying to {{stop()}} a service half-way through {{start()}}. Some of the more 
complex state models make the 'starting', 'stopping' states explicit, which is 
another option. 

* I'm going to make the methods {{synchronized}} to stop the risk of any race 
conditions -while doing a best effort at keeping the notifications outside the 
{{synchronized}} block. The corner case here is that when an init or start 
fails and {{stop()}} is called automatically: it will notify its listeners 
inside the {{stop()}} call.

* I'm going to keep the state queries unsynced (reading a volatile), so that 
doesn't stop things that only want to read and not manipulate service state 
from blocking.

bq. Rename {{inState}} to {{isInState}}?

done

bq. {{waitForServiceToStop}} seems redundant because we also have listeners, 
right? Sure one is a blocking call while the other is async. I'd remove it 
unless there is some other strong reason. May be we can implement an async 
utility using{{ getServiceState()}} and implement a generic 
{{waitForServiceState}} instead of just for stop?

-let me add something that isn't on the critical path for the next alpha, as it 
isn't an API change: an entry point to start a service by its name.
I've just added YARN-679 to show the use case here: an entry point to start any 
service by its classname. That isn't ready to be committed (where are the 
tests!), but it shows the vision. I'll see if I can implement it with the 
notifications.

bq. What is the use of 'blockers'?

An attempt to make the reason a service blocks visible, at least when it is 
consciously blocking (e.g. spin/sleep waiting for manager node). Unconscious 
blocking, by way of blocking API calls, will still happen. If the service can 
declare that they are blocked by something, then other tooling can say "what is 
this service waiting for"

bq. May be LifecycleEvent move to top-level?
-done

bq. Not part of your patch, but we may as well take this opportunity to fix 
this: Rename register -> registerServiceListener? Similarly unregister.
-done

h3. AbstractService

bq. The inner* method-names don't look good when using the service stuff. Shall 
we rename it to to say, for e.g, innerInit to initService ?

bq. Mark all the super init, start, stop methods as final?

Gladly, though it does cause a mockito test to fail:

{code}
testResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 247 sec  <<< FAILURE!
java.lang.AssertionError: null state in null class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker$$EnhancerByMockitoWithCGLIB$$221b8e7
        at 
org.apache.hadoop.yarn.service.AbstractService.enterState(AbstractService.java:431)
        at 
org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:151)
        at 
org.apache.hadoop.yarn.service.CompositeService.serviceInit(CompositeService.java:67)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceInit(ResourceLocalizationService.java:240)
        at 
org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:154)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testResourceRelease(TestResourceLocalizationService.java:239)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
        at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
        at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
{code}
I don't see an obvious fix for this -need some help from a Mockito expert.

bq. Lets err out on null configuration in init().
-done

bq. enterState(): Rename original to either oldState or previousState?
-done

bq. stopQuietly, stopService wrappers are unnecessary?
-purged

h3. {{ServiceStateModel}}
bq. : init to init, start to start transitions should also be allowed - for 
reentrancy?

-I left out on the basis that if you are doing this then something may be 
wrong, but calling stop without any state checks is absolutely critical to 
handle. 

If you think we should me make the others no-op too, it'll be trivial.

bq. {{ServiceStateModel}}: if {{Service.inState()}} is renamed, so should 
{{inState}} in this class too.
-done

bq. Rename 'original' to {{oldState}} or {{previousState}} and state to 
{currentState}?

done for {{original}}, the other is the name of the field -do you think I 
should rename that?

h3. {{ServiceOperarations}}

bq. Why the extra wrappers for init, start - those APIs already internally 
check if the transition is valid or not right? Even so, stop doesn't have this 
extra transition-check while others have.
-removed, were originally there to handle the (more brittle) previous model.

bq. Little confused here: the tests indicate that on listener throwing 
exception doesn't disturb any other listener or the service itself, but I can't 
understand this from the code - notifyListeners doesn't do anything for 
listeners throwing exceptions.

That's left to whatever invokes the notification, here {{AbstractService}}, 
which downgrades failures to warns. The tests verify that this happens.


bq. {{TestServiceOperations.testStopUnstarted}}: You have duplicate checks in 
there.
-pulled the entire test along with the operations tested

bq. {{TestCompositeService.testServiceLifecycleNoChildrenl}}: Typo on test-name
-fixed

bq. Also, can you please fix the findbugs warnings? Tx

I'll see what the report says on the next run
                
> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-530
>                 URL: https://issues.apache.org/jira/browse/YARN-530
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: YARN-117changes.pdf, YARN-530-005.patch, 
> YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch
>
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to