On Dec 16, 2009, at 12:42 PM, Felix Meschberger wrote:

Hi,

Kris Pruden schrieb:
Hi,

I'm puzzling through some strange behaviour in my application related to what appears to be a race and/or some weird interaction between services registered via SCR and bundles trying to monitor these services via the
ServiceTracker API.

I don't have any good theories at the moment, so I was hoping if I
described what I'm seeing it might job somebody's memory...

Here's my situation.  I have an application made up of about a
half-dozen or so bundles, most (but not all) of which register their
services using DS/SCR. I'm using pax:exam to run automated functional tests for this application. Unfortunately, my tests are not reliable - they fail intermittently. When there's a failure, it's always because
of an NPE on one of the SCR-injected @Reference attributes of a
service/component. It's hard to tell for sure, but what looks like is
happening is this:
1. The SCR starts and registers a component/service
2. The functional test, which was waiting for that service to become
available, gets a reference to the service
3. The SCR stops the service, then immediately starts and registers a
new instance of the service

Unfortunately, at this point the test bundle now has a reference to the "stopped" instance of the service, which has had all of its @Reference
fields set to null (hence the NPE).

What I can't figure out is why this is happening.

Anyone have any suggestions about where I should look next, or any known
SCR gotchas that I might be running into?

Well, from far outside, it must be said, that a service may come and go
at any time for any reason...

Now, this doesn't help you, of course. But without a more in-depth look into the situation I cannot tell much, other than: It should work actually.

Let me quickly recapitulate:

Your component under test is a SCR component which also is registered as
a service: Is this a delayed service component or a service factory
component ?

It's not a factory, and it is configured to start immediately. These are the annotations on the service in question:

@Component(immediate=true)
@Service

SCR stops and immediately restarts the service: Are you updating
configuration admin configuration supplied to the component during the
test ?

Not as far as I know. I should point out that I'm not 100% sure that my claim of the start/stop/start behaviour is in fact true. I see evidence that this happens - I've seen log output from the SCR announcing the activation of a component, then another message indicating that it's deactivated, then another one saying it's activated again. I've instrumented the activate/deactivate methods on the service implementation and confirmed that this is in fact happening, and that the second activation is a new instance of the service class. However, even in cases where I don't see this happening, I still am seeing the NPE from time to time (more on this).

You talk about a @Reference annotation: Does this mean the SCR component
under test has a reference to another service ? Is this reference
mandatory or optional ? Is it static or dynamic ? Is there a change in
the referenced service during the test ?

Yes, so to be a bit more concrete I have the service my functional test is exercising, call it ServiceA. This service is annotated as above and contains a dependency on a second service, ServiceB. The class definition looks something like this:

@Component(immediate=true)
@Service
public class ServiceAImpl implements ServiceA {
    @Reference
    ServiceB serviceB;

    ...
}

ServiceB is similarly defined, and (maybe this is relevant) has its own dependencies on other services, also injected via the @Reference annotation as above, but at least one of these is not itself managed via SCR (that is, it's a regular old service registered via a bundle activator). Further (again, not sure if it's relevant), this non-DS service is actually provided via a ServiceFactory...

Last but not least: What DS implementation are you using ? Or: what
version of Felix SCR are you using ?

We're using Felix SCR 1.2.0.

One other data point: I'm pretty sure the issue is at least somewhat timing related. The reason is that I have tried the build on two separate computers, one a (relatively) slow laptop (a macbook pro), the other a quite powerful workstation (quad-core, 8GB RAM, running linux). The tests reliably *pass* on the slow laptop, but fairly reliably *fails* on the fast machine. My theory is that the fast machine is winning the race against SCR in this start/stop/start loop (and therefore losing) while the slow machine is losing that race...

I'm still running experiments; I'll update with any new data I find that I think might be helpful. I realize issues like this are hard enough to figure out when you're watching them happen, so I don't really expect you to know what's going on here (although if you do that would be awesome). Mostly I'm just wondering if this setup is tickling a known timing issue or something along those lines...

Thanks again,

Kris

Regards
Felix


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to