As you seem to have found out the problems disappears when you just do the 
simple thing … block until your resource R is ready.

So now we have the case that you raise: T1 is deactivated while T2 is activated 
before T1 has finished deactivating. Looking at the SCR/DS implementations and 
activities involved as well as experience it is not something I generally worry 
about. I always write my servers assuming they can fail. I just write the 
activate method to start a loop that initializes and then does work. If an 
exception is thrown, I wait with exponential back-off and try again. This race 
condition is just one of the myriad of failure scenarios I try to handle. Imho 
a server should be resilient and not fail forever because the network was 
shutdown or the network interface changed. The following shows a skeleton, in 
practice this code tends to become quite convoluted to react appropriate to all 
possible disturbances.

        @Component
        public class MyServer {

                final Thread thread = new Thread(this, “MyServer”);
                volatile ServerSocket server;

                @Activate void activate() {
                   thread.start();
                }

                @Activate void deactivate() {
                   thread.interrupt();
                   if ( server != null)
                     server.close();
                   thread.join(1 * 60 * 1000);
                }
                   
                        
                public void run() {
                     while( !thread.isInterrupted() ) try {
                         server = new ServerSocket(8080);
                         while( !thread.isInterrupted() ) try { 
                            Socket socket = server.accept();
                            if (thread.isInterrupted())
                               return;

                            process(socket);
                         } catch( Exception e) {
                           log(e);
                         }
                    } catch( Exception e) {
                       try {
                          Thread.sleep(1000);
                       } catch(InterruptedException e) {
                         return;
                       }
                    } finally {
                       if ( server != null)
                          server.close();
                    }
                }

My general philosophy in life is that it is better to accept failures and 
handle them instead of trying to prevent them in all possible theoretical 
cases. It makes for MUCH simpler systems that are more reliable. 

That said, if I would be given this answer I would probably think it is a 
cop-out. So how could we solve this race condition without statics in OSGi?

It would require a new service that handles the atomicity. The component should 
have no dependencies so it only depends on its bundle life cycle which will 
avoid the T1/T2 overlap since a bundle cannot start before it has been fully 
stopped. It should provide a lock method that is taken in activate and released 
in deactivate. So that would ensure atomicity.


        @Component(service=Lock.class)
        public class LockImpl extends ReentrantLock {}

        @Component
        public class MyServer {
          @Reference Lock lock;

          @Activate void activate() { 
                lock.tryLock(1, TimeUnit.MINUTES);
                … init
          }
          @Deactivate void deactivate() { 
                … deinit
                lock.unlock();
          }
        }

However, I would not bother, making your server loop resilient by 
reinitializing after a failure is much more robust against many more error 
cases. The chance that you get overlapping T1.deactivate/T2.activate is in my 
experience magnitudes smaller than getting a network problem, which in the end 
can be handled with the same code. The chance might even be zero but I’ve no 
time to crawl through the spec to proof that right now.

If you really feel strong about this race condition then you should file a bug 
on the public OSGi Bugzilla, the expert group will then take a look if the DS 
specification should be amended.

Kind regards,

        Peter Kriens






> On 8 aug. 2016, at 12:16, list+org.o...@io7m.com wrote:
> 
> On 2016-08-08T09:48:49 +0200
> Peter Kriens <peter.kri...@aqute.biz> wrote:
>> 
>> Correctness before performance.
> 
> That's a sentiment I can get behind. For the purposes of the discussion:
> I'm not interested in the performance implications of the subject we're
> discussing, only the correctness issues.
> 
>> The trick in my experience is to write your code as simple as possible given 
>> the previous model. Disregard bundle boundaries. Bundle updates are rare and 
>> imho irrelevant for general performance. Once you got your system working 
>> correctly measure the activation performance. If you then have a problem the 
>> solution then usually stands out rather clearly.
>> 
>> One solution then is to register the service manually. If you have a very 
>> lengthy initialization then you can create an immediate component without a 
>> service. In the activate method you just start a thread that does the 
>> initialization and when it is done you register the service via the bundle 
>> context you got in the activate method. This same pattern can also be used 
>> when you have to watch a condition (network online for example). You watch 
>> the condition and then you register/unregister the given service.
>> 
> 
> So the problem here is that a service S wants to acquire a global
> resource R and that takes time to acquire. It also needs to give
> back R when it's done, and that takes time to release.
> 
> Let's examine the two possible cases with regards to blocking or
> non-blocking activation/deactivation:
> 
> If S _isn't_ allowed to block in activate()/deactivate(),
> then the resource R may not have been acquired by the time activate()
> has returned, and may not have been released by the time deactivate()
> has returned. The problem then is that when the bundle containing the
> service is restarted, the old instance of S is almost
> certainly still in the process of releasing R, and so the new instance
> T crashes when it tries to acquire R on activation. This leaves S
> not running because it's just been deactivated, and T not running
> because it couldn't start.
> 
> This seems to be impossible to solve in general without using something
> static that can survive bundle restarts and that can serialize the
> operations. It doesn't matter how many indirections of services are
> added that can wait for initialization and then register services, the
> same problem remains for those services too [0]. All adding extra
> services does is punt the problem around to elsewhere in the system.
> Using static is obviously a big NO in a system like OSGi, so that's not
> a good solution.
> 
> However, if S _is_ allowed to block in activate()/deactivate(), then
> the problem appears to be solved with one major caveat that the
> specification seems to specifically not address: I think the
> specification allows OSGi implementations to call activate() on a
> new instance T _concurrently_ with calling deactivate() on the old
> instance S. At least, I don't see anywhere in the spec where this is
> disallowed. If so, it seems that we're depending on unspecified
> implementation-specific behaviour to prevent bad things from happening.
> 
> This view may actually be supported by something you said (emphasis
> mine):
> 
>> In my experience, OSGi significantly changes the landscape concerning
>> initialization. You get lots of local small initializations that are
>> _automatically parallelized_.
> 
> For example, it seems as though it's perfectly permissable for the
> runtime to call deactivate() on an old instance of a service and to
> also concurrently call activate() on a new instance of the same
> service. We're back to square one: The new instance attempts to
> acquire R and can't, while the old instance is still in the process of
> returning R. It doesn't matter if activate()/deactivate() are blocking
> or not.
> 
> M
> 
> [0]: You may notice that back in my original email, I stated that my
> TCPServer example actually doesn't export any services at all, it simply
> forwards data from the network to an existing service, so it's not a
> question of delaying the registration of a service in that specific
> example.
> 
> _______________________________________________
> OSGi Developer Mail List
> osgi-dev@mail.osgi.org
> https://mail.osgi.org/mailman/listinfo/osgi-dev

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
OSGi Developer Mail List
osgi-dev@mail.osgi.org
https://mail.osgi.org/mailman/listinfo/osgi-dev

Reply via email to