Could someone on the servicemix team verify this rationale for me?  I think
I figured out my problem, which is a race condition in Servicemix (which of
course is exploited more on a faster machine).  I'm not sure what the best
solution to work around this problem is, and I am on a short timeframe since
of course this problem really only plagues our production environment and
not our dev or test environment.  Anyhow, I have a servicemix-bean component
that is taking an input, performing some analysis on it, and then sending it
back out to a JMS topic.  In order to send it back out, I'm using the
DeliveryChannel that is injected into my bean.  This sharing of the delivery
channel is what is causing the race condition, here is the reason.  The
exceptions I've been mentioning are referring to the following line:

requests.put(messageExchange.getExchangeId(), currentRequest.get());

This line is dependant upon the request I'm building in order to send out to
the JMS queue (at least I believe so).  The thing is that currentRequest is
global to the current thread, so when a message comes into my
servicemix-bean component, which I believe would be this method snippet:

protected void onProviderExchange(MessageExchange exchange) throws Exception
{
        ...
        currentRequest.set(req);
        synchronized (req) {
        ...
        }
        checkEndOfRequest(req, corId);
        currentRequest.set(null);

Doesn't this mean that anytime a message is delivered (inside the
synchronized block), there is a good chance that when my bean is trying to
use the delivery channel to send a message, that the "currentRequest" member
variable is being clobbered?  I get an NPE .. it seems like that's most
likely a sideeffect of the currentRequest.set(null).  Such as my send inside
the bean, clobbers the currentRequest, and then in turn, the BeanEndpoint
clobbers my request and makes it null, which causes my send to blow up?

It appears as though I maybe should be creating my own deliverychannel or
using another method to send out from my bean?  Or is this something that
could be fixed, and if so, how soon?  Or is it something I could do with
some guidance?  Or is there something else I can do?  I'm going to go look
through the docs for another option, but I really have to work around this,
since my app is pretty much dead right now, which is the worst time.

Thanks for any help!!

Ryan

On Sun, Jun 22, 2008 at 7:17 PM, Ryan Moquin <[EMAIL PROTECTED]> wrote:

> Just a few details on this.. so servicemix deadlocks every time I startup,
> where servicemix needs to redeploy all SEs and all SUs... here is the
> deadlock trace.. all my threads are sitting like this:
>
> "pool-flow.seda.servicemix-bean-thread-1" prio=6 tid=0x2879c400 nid=0x1728
> waiting for monitor entry [0x3512f000..0x3512
> fb94]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at
> org.apache.servicemix.jbi.nmr.flow.seda.SedaFlow.createQueue(SedaFlow.java:189)
>         - waiting to lock <0x07db2050> (a
> org.apache.servicemix.jbi.nmr.flow.seda.SedaFlow)
>         at
> org.apache.servicemix.jbi.nmr.flow.seda.SedaFlow.enqueuePacket(SedaFlow.java:179)
>         at
> org.apache.servicemix.jbi.nmr.flow.seda.SedaFlow.doSend(SedaFlow.java:162)
>         at
> org.apache.servicemix.jbi.nmr.flow.AbstractFlow.send(AbstractFlow.java:123)
>         at
> org.apache.servicemix.jbi.nmr.DefaultBroker.sendExchangePacket(DefaultBroker.java:283)
>         at
> org.apache.servicemix.jbi.security.SecuredBroker.sendExchangePacket(SecuredBroker.java:88)
>         at
> org.apache.servicemix.jbi.container.JBIContainer.sendExchange(JBIContainer.java:830)
>         at
> org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.doSend(DeliveryChannelImpl.java:395)
>         at
> org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.send(DeliveryChannelImpl.java:431)
>         at
> org.apache.servicemix.common.EndpointDeliveryChannel.send(EndpointDeliveryChannel.java:79)
>         at
> org.apache.servicemix.bean.BeanEndpoint$PojoChannel.send(BeanEndpoint.java:571)
>
>
> then if I force it closed, since it's hanging and unresponsive... I do get
> a ton of NPE's on startup and then it APPEARS to start behaving normally...
> so I'm not sure if this latest 13 build has solved the problem enough or if
> some code changes I made did, which again is here:
>
> Caused by: java.lang.NullPointerException
>         at
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>         at
> org.apache.servicemix.bean.BeanEndpoint$PojoChannel.send(BeanEndpoint.java:569)
>         at
>
> and results in some illegal sendSync errors.... is it possible that my
> MessageExchangeFactory is giving back a null message or something like that
> which is causing the NPE?
>
>
> On Sun, Jun 22, 2008 at 5:38 PM, Ryan Moquin <[EMAIL PROTECTED]>
> wrote:
>
>> Hey Bruce,
>>
>> Out of curiousity.. did you guys do anything with this issue in the last
>> day or two?  I've been working on some stuff on my end that I was hoping may
>> have had some effect.. when I run it on the build tagged "12" in the nightly
>> builds, I have the same issue.  When I try the one tagged "13" all my
>> outgoing JMS requests on the DeliveryChannel deadlock the first time I start
>> up Servicemix.  The second time I start up it, I get a ton of errors about
>> illegally calling sendSync but my threads don't deadlock that second time.
>> The deadlock is way down in the bowels of servicemix... I don't see that NPE
>> anymore on the latest one, so I'm just curious so I know exactly what I'm
>> trying to solve at this point! :)
>>
>> Thanks!!
>>
>> Ryan
>>
>>
>> On Sat, Jun 21, 2008 at 12:48 AM, Ryan Moquin <[EMAIL PROTECTED]>
>> wrote:
>>
>>> I hate to say this Bruce, but unfortunately 3.2.2 is working pretty bad
>>> for me and has the same problem :(  It only seems to affect our production
>>> server and in a few minutes after start up, I start getting the
>>> NullPointerException.. slowly all my services start doing it and they all
>>> stop working.  If I stop and restart servicemix, then servicemix-jms
>>> components are no longer routable.  I'm guessing this error causes
>>> Servicemix to shut them down and not deploy them anymore (Servicemix does
>>> the same thing if a service unit starts up, tries to make a Joram connection
>>> to a server that isn't up, servicemix will shutdown that SU and will always
>>> shut it down immediately after it starts on any subsequent run).
>>>
>>> It seems like this problem must be related to a race condition.  When
>>> doing development testing, I never see this problem on my laptop even under
>>> high load.  One our fast test server, I see this error popup once on startup
>>> and then it doesn't seem to happen again.  On our even faster production
>>> server, the whole thing looses it's wheels and falls apart after a few
>>> minutes.
>>>
>>> I'm supposed to be deploying this system in a few days and of course
>>> that's the only spot where I can't temporarily limp by.  Is it possible you
>>> could give me some hints on what the problem is and I'll debug it this
>>> weekend to see if I can fix it, or at least patch it temporarily?  I really
>>> need to figure out way to get around this problem.  Other than that, 3.2.2
>>> seems to work perfectly fine.
>>>
>>> Here is the caused by error again in case it's any bit different than the
>>> 3.2.1 one was:
>>>
>>> Caused by: java.lang.NullPointerException
>>>         at
>>> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>>>         at
>>> org.apache.servicemix.bean.BeanEndpoint$PojoChannel.send(BeanEndpoint.java:569)
>>>         at
>>> com.notification.impl.JbiNotificationHandlerImpl.sendNotification(JbiNotificat
>>> ionHandlerImpl.java:80)
>>>
>>> Also, is the delivery channel component threadsafe?  I'm curious if
>>> multiple threads accessing it is a problem or if I should keep access to it
>>> synchronized?  I'm currently synchronizing, but don't want to if I don't
>>> need to.
>>>
>>> Thanks!
>>> Ryan
>>>
>>>
>>> On Thu, Jun 19, 2008 at 2:19 AM, Bruce Snyder <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> On Wed, Jun 18, 2008 at 7:56 PM, Ryan Moquin <[EMAIL PROTECTED]>
>>>> wrote:
>>>> > I'm using Servicemix 3.2.1, so I'll give 3.2.2 a try.  I was kind of
>>>> waiting
>>>> > until it was released, but this problem is now cropping up on a
>>>> regular
>>>> > basis on a server so I'll definitely give it a shot.  Hopefully this
>>>> will
>>>> > allow me to get this project done so I can then get that Joram write
>>>> up done
>>>> > since I should have ran across hopefully most of the gotchas for it at
>>>> that
>>>> > point.
>>>>
>>>> 3.2.2 will be released very soon and I know it's pretty stable. Maybe
>>>> we can release it this weekend.
>>>>
>>>> Bruce
>>>> --
>>>> perl -e 'print
>>>> unpack("u30","D0G)[EMAIL PROTECTED]&5R\"F)R=6-E+G-N>61E<D\!G;6%I;\"YC;VT*"
>>>> );'
>>>>
>>>> Apache ActiveMQ - http://activemq.org/
>>>> Apache Camel - http://activemq.org/camel/
>>>> Apache ServiceMix - http://servicemix.org/
>>>>
>>>> Blog: http://bruceblog.org/
>>>>
>>>
>>>
>>
>

Reply via email to