If you do not implement something special for clean shutdown of inflight exchanges then the normal error handling should take effect like you mentioned. So for example a db transaction should roll back. Some issue may be that e.g. a service call can not be rolled back.

On the other hand I think implementing clean shutdown will add a lot of complexity. The special code will only be executed for quite rare cases. These two effects increase the change of programming errors in the code. So I am with you that in most cases you can just implement normal error handling and jsut live with the fact that inflight calls might run into errors.

What I have seen on production systems is that they mark a machine to be updated as inactive on a front end load balancer. So no new requests come in and after some time you can quite safely update the bundles. This is a quite low tech solution but I think exactly for this reason it works so well.

So while I wanted to understand clean shutdown better for the discussion on aries dev I do not think it should always be done.

Btw. For my current redesign of jpa I have one problem that I would like to get some feedback / ideas. I am providing a so called EmSupplier: https://github.com/cschneider/jpa-experiments/blob/master/jpa-support/src/main/java/net/lr/jpa/impl/EMSupplierImpl.java

This class will be offered as a service per persistence unit and should help to work with jpa. There is a precall method that will create an EM on the thread. Then there is a get() to retrieve the local thread em and a postcall that will close the EM again. As discussed a bundle should have stopped all work when the stop method is done. In this case this applies to the case where the PU bundle will be stopped. So the EntityManagerFactory will also be deregistered and closed. As the EMSupplier depends on the EMF it will also have to be closed.

Now the problem is that there might still be threads working on their per thread EMs. The really safe way is to wait until all these threads have closed their EMs. This is what I am doing now. To make it a little more predictable I added a timeout and close the remaining EMs after the timeout.

So the question is: Is this a best practice ? The clear disadvantage is that stopping a PU bundle could take quite long (depending on timeout). Would it be better to just let the threads close the EMs asynchronously and ignore the fact that this might go wrong if the bundle is uninstalled in the mean time.

Christian

Am 15.02.2015 um 18:38 schrieb Peter Kriens:
As always with design, it is about trade offs. As indicated in my mail, the recovery time can be shortened if you can do a controlled shutdown. I know this was a big issue with mainframes, however, I doubt that with today’s highly distributed systems this is still very relevant. In general, when I have the choice in these circumstances I would rather focus on reducing startup time instead of trying to manage shutdown more nicely.

I think the complexity of the additional recovery part is also dangerous, especially since you will have a common path and one that only gets executed when the shit really hits the fan. I think that is worth some additional startup time in one of the many machines in the cluster.

That said, every case is special. Just sharing my long experience in seeing overly complicated solutions that looked good close up but provided no real gain when you looked at the overall picture.

Kind regards,

Peter Kriens










On 15 feb. 2015, at 13:18, Graham Charters <chart...@uk.ibm.com <mailto:chart...@uk.ibm.com>> wrote:

Hi Peter,

I think you and I see different customer use cases. As I mentioned at the last OSGi f2f, we have customers whose applications take a significant amount of time to start and they have many instance. Rolling updates can therefore take a long time if full application restart is necessary, so these customers want to minimise application update time and disruption. These are transactional deployments with failover so they can be recovered if someone trips over the power chord, but that doesn't mean they want use this during normal maintenance.


Regards, Graham.

Graham Charters PhD CEng MBCS PhD
STSM, WebSphere OSGi Applications & Liberty Repository Lead Architect, Master Inventor IBM United Kingdom Limited, MP 146, Hursley Park, Winchester, SO21 2JN, UK Tel: +44 1962 816527 Email: chart...@uk.ibm.com <mailto:chart...@uk.ibm.com>

Peter Kriens --- Re: [osgi-dev] How to cleanly update/uninstall bundles ---

From: "Peter Kriens" <peter.kri...@aqute.biz <mailto:peter.kri...@aqute.biz>> To: "OSGi Developer Mail List" <osgi-dev@mail.osgi.org <mailto:osgi-dev@mail.osgi.org>>
Date:   Sun, 15 Feb 2015 11:48
Subject:        Re: [osgi-dev] How to cleanly update/uninstall bundles

------------------------------------------------------------------------

I am not sure I agree with your conclusion. :-)

Since it is theoretically impossible to protect against hard failure (power, kernel panic, kill -9, distributed call when the cable is plugged, etc) any valuable application must have protection against an unexpected exit at any moment in time. Idempotency, consensus, and transactionality are your friends in these cases. So if you are protected against these bad failures, how bad can an in-flight shutdown be? Best case you can shorten the recovery time at restart but this often requires additional complexity that can then also fail. Since the chance that things go wrong in-flight is quite small I would take the recovery cost in the unlikely event you got caught.

Related is my very old opposition to an update or uninstall callback to the bundle. Though it is an awfully attractive idea with lots of good stuff the party is spoiled because you cannot guarantee such a call circumstances.

Billy Joy (Sun Founder) once told us a story about the development of the Internet, of which he took part. Initially they tried to make every router perfect but this turned the routers incredibly expensive and there were still failure scenarios that even a perfect router could not handle (power, cable cuts). Then someone proposed to assume the routers were very imperfect and that the end points should correct the problems in the net. This changed a very large number of very hard to handle failure scenario into one problem: how to handle a missing package. If a router panicked, lost power, a cable was cost, too busy, out of memory, had no clue: discard the package.

It is a pervasive problem in Enterprise software world that we want to ignore failure because it is so hard. For example, Blueprint has this awful service damping that looks so attractive for the developer (Look Ma, no dynamics!) but by hiding the reality you get caught in lots of unexpected places.

Bad software expects an unchanging perfect world, good software is more realistic. Embrace failure! :-)

Kind regards,

Peter Kriens


On 15 feb. 2015, at 11:09, Christian Schneider <ch...@die-schneider.net <mailto:ch...@die-schneider.net>> wrote:

Thanks to all of you for the insights.

From the responses I take that clean shutdown is not in scope of OSGi itself. I agree that it is best solved on the application level. On the other hand I see that the Quiesce API can at least cover some
cases and so it has its values.

Christian

Am 13.02.2015 um 17:55 schrieb Raymond Auge:
To my knowledge what you are speaking of is not intentionally supported by the dynamics of osgi. This topic comes up all the time, it's funny.

If you must support "in flight" changes, then you have to implement this support in your code using concurrency constructs.

Note that unregistering a service is a synchronous operation during "shutdown" of a bundle, and so with proper concurrency measures in place, a bundle could both be shutting down (meaning it's not reachable by other bundles) and also finishing any ongoing work.

Anyone feel free to correct me but this is what I've learned in my short experience.

- Ray



_______________________________________________
OSGi Developer Mail List
osgi-dev@mail.osgi.org
https://mail.osgi.org/mailman/listinfo/osgi-dev


--
Christian Schneider
http://www.liquid-reality.de

Open Source Architect
Talend Application Integration Division http://www.talend.com

_______________________________________________
OSGi Developer Mail List
osgi-dev@mail.osgi.org
https://mail.osgi.org/mailman/listinfo/osgi-dev

Reply via email to