On Dec 16, 2009, at 12:42 PM, Felix Meschberger wrote:
Hi,
Kris Pruden schrieb:
Hi,
I'm puzzling through some strange behaviour in my application
related to
what appears to be a race and/or some weird interaction between
services
registered via SCR and bundles trying to monitor these services via
the
ServiceTracker API.
I don't have any good theories at the moment, so I was hoping if I
described what I'm seeing it might job somebody's memory...
Here's my situation. I have an application made up of about a
half-dozen or so bundles, most (but not all) of which register their
services using DS/SCR. I'm using pax:exam to run automated
functional
tests for this application. Unfortunately, my tests are not
reliable -
they fail intermittently. When there's a failure, it's always
because
of an NPE on one of the SCR-injected @Reference attributes of a
service/component. It's hard to tell for sure, but what looks like
is
happening is this:
1. The SCR starts and registers a component/service
2. The functional test, which was waiting for that service to become
available, gets a reference to the service
3. The SCR stops the service, then immediately starts and registers a
new instance of the service
Unfortunately, at this point the test bundle now has a reference to
the
"stopped" instance of the service, which has had all of its
@Reference
fields set to null (hence the NPE).
What I can't figure out is why this is happening.
Anyone have any suggestions about where I should look next, or any
known
SCR gotchas that I might be running into?
Well, from far outside, it must be said, that a service may come and
go
at any time for any reason...
Now, this doesn't help you, of course. But without a more in-depth
look
into the situation I cannot tell much, other than: It should work
actually.
Let me quickly recapitulate:
Your component under test is a SCR component which also is
registered as
a service: Is this a delayed service component or a service factory
component ?
It's not a factory, and it is configured to start immediately. These
are the annotations on the service in question:
@Component(immediate=true)
@Service
SCR stops and immediately restarts the service: Are you updating
configuration admin configuration supplied to the component during the
test ?
Not as far as I know. I should point out that I'm not 100% sure that
my claim of the start/stop/start behaviour is in fact true. I see
evidence that this happens - I've seen log output from the SCR
announcing the activation of a component, then another message
indicating that it's deactivated, then another one saying it's
activated again. I've instrumented the activate/deactivate methods on
the service implementation and confirmed that this is in fact
happening, and that the second activation is a new instance of the
service class. However, even in cases where I don't see this
happening, I still am seeing the NPE from time to time (more on this).
You talk about a @Reference annotation: Does this mean the SCR
component
under test has a reference to another service ? Is this reference
mandatory or optional ? Is it static or dynamic ? Is there a change in
the referenced service during the test ?
Yes, so to be a bit more concrete I have the service my functional
test is exercising, call it ServiceA. This service is annotated as
above and contains a dependency on a second service, ServiceB. The
class definition looks something like this:
@Component(immediate=true)
@Service
public class ServiceAImpl implements ServiceA {
@Reference
ServiceB serviceB;
...
}
ServiceB is similarly defined, and (maybe this is relevant) has its
own dependencies on other services, also injected via the @Reference
annotation as above, but at least one of these is not itself managed
via SCR (that is, it's a regular old service registered via a bundle
activator). Further (again, not sure if it's relevant), this non-DS
service is actually provided via a ServiceFactory...
Last but not least: What DS implementation are you using ? Or: what
version of Felix SCR are you using ?
We're using Felix SCR 1.2.0.
One other data point: I'm pretty sure the issue is at least somewhat
timing related. The reason is that I have tried the build on two
separate computers, one a (relatively) slow laptop (a macbook pro),
the other a quite powerful workstation (quad-core, 8GB RAM, running
linux). The tests reliably *pass* on the slow laptop, but fairly
reliably *fails* on the fast machine. My theory is that the fast
machine is winning the race against SCR in this start/stop/start loop
(and therefore losing) while the slow machine is losing that race...
I'm still running experiments; I'll update with any new data I find
that I think might be helpful. I realize issues like this are hard
enough to figure out when you're watching them happen, so I don't
really expect you to know what's going on here (although if you do
that would be awesome). Mostly I'm just wondering if this setup is
tickling a known timing issue or something along those lines...
Thanks again,
Kris
Regards
Felix
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]