> On 8 aug. 2016, at 17:02, list+org.o...@io7m.com wrote: > 'Ello. > On 2016-08-08T15:11:05 +0200 > Peter Kriens <peter.kri...@aqute.biz> wrote: >> As you seem to have found out the problems disappears when you just >> do the simple thing … block until your resource R is ready. > On this implementation, for this problem, sure! :D > > It is in fact what I ended up putting into the example: > > > https://github.com/io7m/osgi-example-reverser/blob/master/tcp-server/src/main/java/com/io7m/reverser/tcp_server/TCPServerService.java#L47 a) You will get an automatic retry by DS but then you’re component is dead when it runs in a problem. b) At certain network problems you will overload the log, at least insert ’some’ delay c) the server dies at certain network problems and will never recover because you create the server socket in the constructor and not in a loop
OSGi is a server model, you should write your servers to run forever. There is no reason to bail out ever before you’re stopped. This was the primary pain point of Blueprint where the application termed into zombie state if you were not quick enough. Things fail so if a server is not able to survive such failures you create very brittle and fragile systems. A server should always try to run and handle any failures without overloading the system. This is worth it just during development where you often need to ensure other things are in place. Talk to Blueprint users and they start to weep :-) > > Blocking initialization happens in the TCPServer constructor. There's > no exponential backoff or retrying yet, so it's still fragile to other > network problems relating to bind(). Yup > >> So now we have the case that you raise: T1 is deactivated while T2 is >> activated before T1 has finished deactivating. Looking at the SCR/DS >> implementations and >activities involved as well as experience it is >> not something I generally worry about. I always write my servers >> assuming they can fail. I just write the activate method to start a >> loop that initializes and then does work. If an exception is thrown, >> I wait with exponential back-off and try again. > > Right. > > I think it's fairly common outside of OSGi (and more common outside of > Java due to the [not very accurately] perceived problem of VM startup > times) to let entire processes crash early and then use process > supervision systems (daemontools, runit, etc) to restart the processes. > Obviously this approach doesn't apply to OSGi as the whole point is to > not have to keep restarting whole systems. Experience shows it is a non-problem if your deactivate method closes the socket. If you find it happens at an annoying rate you can always add a short delay. (I recall on Windows you had to delay a bit after a socket close because the port was not immediately available.) >> My general philosophy in life is that it is better to accept failures >> and handle them instead of trying to prevent them in all possible >> theoretical cases. It makes for MUCH simpler systems that are more >> reliable. > > I agree in general. This had the feel of a bug in my code (or worse, a > problem with the way OSGi is specified) rather than an acceptable > failure, though. > >> That said, if I would be given this answer I would probably think it >> is a cop-out. So how could we solve this race condition without >> statics in OSGi? >> >> It would require a new service that handles the atomicity. The >> component should have no dependencies so it only depends on its >> bundle life cycle which will avoid the T1/T2 overlap since a bundle >> cannot start before it has been fully stopped. It should provide a >> lock method that is taken in activate and released in deactivate. So >> that would ensure atomicity. >> > > It feels like in order for this LockImpl solution to work, you'd have > to be relying on the same assumption that you'd be relying on if you > simply blocked in activate()/deactivate()... The assumption that if > the bundle is restarted/refreshed, then the activate() method of the > new instances won't be called concurrently with the deactivate() > methods of the old instances. If that assumption doesn't hold, then > it seems that the LockImpl would not have the right effect. There'd > effectively be two lock instances. As I stated in the text, the Bundle life cycle guarantees that the lock can never be instantiated twice.I intuit the T1/T2 overlap is also not possible for singleton components that have references but just don’t have the time to prove it because for me it is a non problem. Despite extensive experiences it never happened to me. However, in the extremely unlikely case it would happen I am covered with the double server loop. What more do I need? >> However, I would not bother, making your server loop resilient by >> reinitializing after a failure is much more robust against many more >> error cases. The chance that you get overlapping >> T1.deactivate/T2.activate is in my experience magnitudes smaller than >> getting a network problem, which in the end can be handled with the >> same code. The chance might even be zero but I’ve no time to crawl >> through the spec to proof that right now. > I agree, I wouldn't bother. > >> If you really feel strong about this race condition then you should >> file a bug on the public OSGi Bugzilla, the expert group will then >> take a look if the DS specification should be amended. > I don't know enough to feel that strongly. :D > > I think it would be nice if the spec had something unambiguous to say > about this case. Given that the point of OSGi is to be standardized, > it'd be better if we weren't relying on unspecified assumptions for > something like this. The fact that I had to ask the question(s) at all > might indicate a problem (or maybe I’m being overly pedantic). The question is good but in my experience it is a bit like fear of flying. The first time you do it you see a lot dragons in the air that over time disappear. I once lost a customer because I explained to him that there was a finite change he would lose an article on our Newspaper editorial system. The OSGi service model cannot always prevent stale service references. Today we make software that consists of so many parts that it is impossible to rely on their perfection. We have to write software defensively to handle errors and failures. Look at the average log of a complex system and be impressed that the system can actually provide its function. When working for Ericsson Research in Stockholm Bill Joy from Sun visited us to promote Jini. He told a store I never forget.In the early days of the Internet they started making a network of perfect components so components could perfectly rely on each other and would therefore be simpler. After a lot of time, sweat, money, and pain they realized that it was not possible: failure was unavoidable and had to be handled. They then decided to assume each network component was unreliable and the reliability had to be constructed from these unreliable components. I thought that was one of the best lessons I ever got. Since then my motto is ‘Embrace failure’. So yes, you’re pedantic :-), but that is the only way to make progress because we should always challenge the way we work to improve it. I therefore do appreciate these questions. I hope you found my answers just as useful. Kind regards, Peter Kriens > > M > _______________________________________________ > OSGi Developer Mail List > osgi-dev@mail.osgi.org > https://mail.osgi.org/mailman/listinfo/osgi-dev
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ OSGi Developer Mail List osgi-dev@mail.osgi.org https://mail.osgi.org/mailman/listinfo/osgi-dev