Re: [osgi-dev] Waiting in deactivate/activate()?

Peter Kriens Tue, 09 Aug 2016 00:33:39 -0700

> On 8 aug. 2016, at 17:02, list+org.o...@io7m.com wrote:
> 'Ello.
> On 2016-08-08T15:11:05 +0200
> Peter Kriens <peter.kri...@aqute.biz> wrote:
>> As you seem to have found out the problems disappears when you just
>> do the simple thing … block until your resource R is ready.
> On this implementation, for this problem, sure! :D
> 
> It is in fact what I ended up putting into the example:
> 
>  
> https://github.com/io7m/osgi-example-reverser/blob/master/tcp-server/src/main/java/com/io7m/reverser/tcp_server/TCPServerService.java#L47
a) You will get an automatic retry by DS but then you’re component is dead when 
it runs in a problem. 
b) At certain network problems you will overload the log, at least insert 
’some’ delay
c) the server dies at certain network problems and will never recover because 
you create the server socket in the constructor and not in a loop


OSGi is a server model, you should write your servers to run forever. There is 
no reason to bail out ever before you’re stopped. This was the primary pain 
point of Blueprint where the application termed into zombie state if you were 
not quick enough. 

Things fail so if a server is not able to survive such failures you create very 
brittle and fragile systems. A server should always try to run and handle any 
failures without overloading the system. This is worth it just during 
development where you often need to ensure other things are in place. Talk to 
Blueprint users and they start to weep :-)

> 
> Blocking initialization happens in the TCPServer constructor. There's
> no exponential backoff or retrying yet, so it's still fragile to other
> network problems relating to bind().
Yup

> 
>> So now we have the case that you raise: T1 is deactivated while T2 is
>> activated before T1 has finished deactivating. Looking at the SCR/DS
>> implementations and >activities involved as well as experience it is
>> not something I generally worry about. I always write my servers
>> assuming they can fail. I just write the activate method to start a
>> loop that initializes and then does work. If an exception is thrown,
>> I wait with exponential back-off and try again. 
> 
> Right.
> 
> I think it's fairly common outside of OSGi (and more common outside of
> Java due to the [not very accurately] perceived problem of VM startup
> times) to let entire processes crash early and then use process
> supervision systems (daemontools, runit, etc) to restart the processes.
> Obviously this approach doesn't apply to OSGi as the whole point is to
> not have to keep restarting whole systems.
Experience shows it is a non-problem if your deactivate method closes the 
socket. If you find it happens at an annoying rate you can always add a short 
delay. (I recall on Windows you had to delay a bit after a socket close because 
the port was not immediately available.)

>> My general philosophy in life is that it is better to accept failures
>> and handle them instead of trying to prevent them in all possible
>> theoretical cases. It makes for MUCH simpler systems that are more
>> reliable.
> 
> I agree in general. This had the feel of a bug in my code (or worse, a
> problem with the way OSGi is specified) rather than an acceptable
> failure, though.
> 
>> That said, if I would be given this answer I would probably think it
>> is a cop-out. So how could we solve this race condition without
>> statics in OSGi?
>> 
>> It would require a new service that handles the atomicity. The
>> component should have no dependencies so it only depends on its
>> bundle life cycle which will avoid the T1/T2 overlap since a bundle
>> cannot start before it has been fully stopped. It should provide a
>> lock method that is taken in activate and released in deactivate. So
>> that would ensure atomicity.
>> 
> 
> It feels like in order for this LockImpl solution to work, you'd have
> to be relying on the same assumption that you'd be relying on if you
> simply blocked in activate()/deactivate()... The assumption that if
> the bundle is restarted/refreshed, then the activate() method of the
> new instances won't be called concurrently with the deactivate()
> methods of the old instances. If that assumption doesn't hold, then
> it seems that the LockImpl would not have the right effect. There'd
> effectively be two lock instances.
As I stated in the text, the Bundle life cycle guarantees that the lock can 
never be instantiated twice.I intuit the T1/T2 overlap is also not possible for 
singleton components that have references but just don’t have the time to prove 
it because for me it is a non problem. Despite extensive experiences it never 
happened to me. However, in the extremely unlikely case it would happen I am 
covered with the double server loop. What more do I need? 

>> However, I would not bother, making your server loop resilient by
>> reinitializing after a failure is much more robust against many more
>> error cases. The chance that you get overlapping
>> T1.deactivate/T2.activate is in my experience magnitudes smaller than
>> getting a network problem, which in the end can be handled with the
>> same code. The chance might even be zero but I’ve no time to crawl
>> through the spec to proof that right now.
> I agree, I wouldn't bother.
> 
>> If you really feel strong about this race condition then you should
>> file a bug on the public OSGi Bugzilla, the expert group will then
>> take a look if the DS specification should be amended.
> I don't know enough to feel that strongly. :D
> 
> I think it would be nice if the spec had something unambiguous to say
> about this case. Given that the point of OSGi is to be standardized,
> it'd be better if we weren't relying on unspecified assumptions for
> something like this. The fact that I had to ask the question(s) at all
> might indicate a problem (or maybe I’m being overly pedantic).
The question is good but in my experience it is a bit like fear of flying. The 
first time you do it you see a lot dragons in the air that over time disappear. 
I once lost a customer because I explained to him that there was a finite 
change he would lose an article on our Newspaper editorial system. The OSGi 
service model cannot always prevent stale service references. Today we make 
software that consists of so many parts that it is impossible to rely on their 
perfection. We have to write software defensively to handle errors and 
failures. Look at the average log of a complex system and be impressed that the 
system can actually provide its function.

When working for Ericsson Research in Stockholm Bill Joy from Sun visited us to 
promote Jini. He told a store I never forget.In the early days of the Internet 
they started making a network of perfect components so components could 
perfectly rely on each other and would therefore be simpler. After a lot of 
time, sweat, money, and pain they realized that it was not possible: failure 
was unavoidable and had to be handled. They then decided to assume each network 
component was unreliable and the reliability had to be constructed from these 
unreliable components. I thought that was one of the best lessons I ever got. 
Since then my motto is ‘Embrace failure’.

So yes, you’re pedantic :-), but that is the only way to make progress because 
we should always challenge the way we work to improve it. I therefore do 
appreciate these questions. I hope you found my answers just as useful.

Kind regards,

        Peter Kriens




> 
> M
> _______________________________________________
> OSGi Developer Mail List
> osgi-dev@mail.osgi.org
> https://mail.osgi.org/mailman/listinfo/osgi-dev

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
OSGi Developer Mail List
osgi-dev@mail.osgi.org
https://mail.osgi.org/mailman/listinfo/osgi-dev

Re: [osgi-dev] Waiting in deactivate/activate()?

Reply via email to