On Mon, Nov 24, 2008 at 7:14 AM, Chris Taylor <[EMAIL PROTECTED]> wrote:
> Some more information regarding this error: > > we are still seeing this even with the ODE Trunk 1.2.1 deployment. It > occurs quite rarely, but it seems the catalyst is an OutOfMemoryError raised > by ODE when a new request comes in: > Reviewing the code again I couldn't spot anything that would produce this behavior. The process or the process data aren't stored in structures that would be sensitive to OOM. One thing that could help would be a debug log of BpelEngineImpl when the problem occurs as routing to a given process from the message happens in BpelEngineImpl.route(). So you could just set that logger to debug and see the next time it happens. Thanks, Matthieu > > > java.lang.OutOfMemoryError > > at > org.apache.ode.bpel.engine.MyRoleMessageExchangeImpl$ResponseFuture.get(MyRoleMessageExchangeImpl.java:201) > > at > org.apache.ode.axis2.ODEService.onAxisMessageExchange(ODEService.java:149) > > at > org.apache.ode.axis2.hooks.ODEMessageReceiver.invokeBusinessLogic(ODEMessageReceiver.java:67) > > at > org.apache.ode.axis2.hooks.ODEMessageReceiver.invokeBusinessLogic(ODEMessageReceiver.java:50) > > at > org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:96) > > at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:145) > > at > org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:275) > > at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:120) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:763) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:856) > > at com.ibm.ws <http://com.ibm.ws.webcontainer.servlet.servletwrapper.se/> > .webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1075) > > at com.ibm.ws > .webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:550) > > at > com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:478) > > at > com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90) > > at > com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:744) > > at > com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1455) > > at com.ibm.ws <http://com.ibm.ws.webcontainer.channel.wcchannellink.re/> > .webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:115) > > at com.ibm.ws <http://com.ibm.ws.http.channel.inbound.impl.ht/> > .http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:458) > > at > com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewInformation(HttpInboundLink.java:387) > > at > com.ibm.ws<http://com.ibm.ws.http.channel.inbound.impl.httpiclreadcallback.com/> > .http.channel.inbound.impl.HttpICLReadCallback.complete(HttpICLReadCallback.java:102) > > at > com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165) > > at com.ibm.io <http://com.ibm.io.async.abstractasyncfuture.in/> > .async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217) > > at com.ibm.io <http://com.ibm.io.async.asyncchannelfuture.fi/> > .async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161) > > at com.ibm.io <http://com.ibm.io.async.asyncfuture.com/> > .async.AsyncFuture.completed(AsyncFuture.java:136) > > at com.ibm.io <http://com.ibm.io.async.resulthandler.com/> > .async.ResultHandler.complete(ResultHandler.java:195) > > at com.ibm.io <http://com.ibm.io.async.resulthandler.ru/> > .async.ResultHandler.runEventProcessingLoop(ResultHandler.java:743) > > at com.ibm.io <http://com.ibm.io.async.re/> > .async.ResultHandler$2.run(ResultHandler.java:873) > > at com.ibm.ws <http://com.ibm.ws.util.th/> > .util.ThreadPool$Worker.run(ThreadPool.java:1473) > > > > After Websphere recovers, from this point on until we redeploy the process > in question to a new version, ODE attempts to route subsequent requests to a > retired version. > > > > [11/20/08 14:29:26:968 CST] 00000046 SystemOut O 14:29:26,967 ERROR > [BpelEngineImpl] Scheduled job failed; jobDetail={type=INVOKE_INTERNAL, > pid={http://eclipse.org/bpel/sample}AdminYNProcess-195,<http://eclipse.org/bpel/sample%7DAdminYNProcess-195,>inmem=true, > mexid=4611686018427387977} > > org.apache.ode.bpel.runtime.InvalidProcessException: Process is retired. > > at > org.apache.ode.bpel.engine.PartnerLinkMyRoleImpl.invokeNewInstance(PartnerLinkMyRoleImpl.java:173) > > at > org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java:204) > > at > org.apache.ode.bpel.engine.BpelProcess.handleWorkEvent(BpelProcess.java:372) > > at > org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:326) > > at > org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:373) > > at > org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:337) > > at > org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:336) > > at > org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:174) > > at > org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:335) > > at > org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:332) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690) > > at java.lang.Thread.run(Thread.java:810) > > Attached is the Java core dump file from the time of the original > OutOfMemoryError, showing that it was caused by excessive garbage > collection. the VM this runs under allocates 1 Gig of memory on the heap. > > - Chris Taylor > > ------------------------------ > *From:* Matthieu Riou <[EMAIL PROTECTED]> > *To:* [email protected] > *Cc:* Dave Cecchi <[EMAIL PROTECTED]> > *Sent:* Thursday, October 16, 2008 10:40:57 AM > *Subject:* Re: Client calling retired process? > > On Wed, Oct 15, 2008 at 9:27 AM, Chris Taylor <[EMAIL PROTECTED]> wrote: > > > Matthieu, Yes would appreciate if you could put that latest built war > > somewhere. We have attempted to build with buildr without success. > > > > Here it is: > > http://people.apache.org/~mriou/ode-axis2-war-1.2.1-SNAPSHOT.war<http://people.apache.org/%7Emriou/ode-axis2-war-1.2.1-SNAPSHOT.war> > > Let me know how it goes. > > Cheers, > Matthieu > > > > > > > > > > ----- Original Message ---- > > From: Matthieu Riou <[EMAIL PROTECTED]> > > To: [email protected] > > Sent: Monday, October 13, 2008 1:30:56 PM > > Subject: Re: Client calling retired process? > > > > On Mon, Oct 13, 2008 at 10:55 AM, Chris Taylor <[EMAIL PROTECTED]> > wrote: > > > > > Thanks, Matthieu. Some background: > > > > > > we're running ODE 1.2 on Websphere 6.1, with Oracle 10g as the process > > > store. > > > > > > This scenario consistently fails in the manner I described, but it > seems > > > only for certain processes. > > > > > > So, for example, if i have the following: > > > > > > ProcessA-20 > > > ProcessB-21 > > > ProcessC-22 > > > > > > deployed in my environment, the scenario would be that something causes > > > ProcessA-20 to hang - at which point it goes into recovery mode and > > spawns > > > an ode job to retry. From this point on, new requests to (not just) > > > ProcessA get routed to the now-retired ProcessA-19, but also new > requests > > to > > > ProcessB get routed to (now-retired) ProcessB-20! The weird thing is, > > > ProcessC-22 is apparently unaffected. It still gets calls legitimately > > > routed to its latest versioned deployment, ProcessC-22. > > > > > > I do not know if this happens under other scenarios unrelated to > > recovery. > > > I think I just do not have enough data points yet to say. > > > > > > > > > > If you have a reproducible test scenario, it would be great if you could > > try > > it with the current stable branch. I've fixed something related to what > > you're describing a couple of months ago. If doing a build is an issue > for > > you, I can upload the WAR to a public place. > > > > Thanks, > > Matthieu > > > > > > > > > > > > > > > > ----- Original Message ---- > > > From: Matthieu Riou <[EMAIL PROTECTED]> > > > To: [email protected] > > > Sent: Monday, October 13, 2008 12:33:18 PM > > > Subject: Re: Client calling retired process? > > > > > > On Mon, Oct 13, 2008 at 8:17 AM, Chris Taylor <[EMAIL PROTECTED]> > > wrote: > > > > > > > Thanks, Alexis, but i'm no closer to fully understanding why this > > occurs. > > > > It happens periodically now almost everyday with different deployed > > > > processes. Although I don't understand it, I have done some research > > > into > > > > the behaviour. Here's a scenario: > > > > > > > > we'll deploy ProcessA-19, then retire it with ProcessA-20 deployment. > > At > > > > some point it, or another, process will fail and attempt to go into > > > recovery > > > > mode (excuse me if I state this incorrectly), at this point ODE will > > > create > > > > a scheduled job in an attempt to retry the service later. > > > > > > > > Here's where it gets screwy. From then on, all new calls to ProcessA > > > will > > > > not route to ProcessA-20, but ode will attempt to route them to > > > ProcessA-19, > > > > which is of course retired. Ode does not recover from this. It seems > > the > > > > only way to compensate is to redeploy ProcessA as ProcessA-21. New > > > requests > > > > will then route correctly. > > > > > > > > Any idea here? > > > > > > > > > > I'll have to ask a few more questions to narrow it down and make sure I > > > understand correctly: > > > > > > * Does the exact same scenario sometimes works and sometimes doesn't? > > > * Is it always happening in relation with recovery and retry or did > you > > > see it happen in other situations as well? > > > * Which version of ODE are you using? Have you tried with a recent 1.X > > > branch? > > > > > > Thanks, > > > Matthieu > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > From: Alexis Midon <[EMAIL PROTECTED]> > > > > To: [email protected] > > > > Sent: Wednesday, October 8, 2008 7:26:54 PM > > > > Subject: Re: Client calling retired process? > > > > > > > > Hi Chris, > > > > > > > > No new executions can be started on a retired process, but running > > > > instances > > > > can still finish their job. [1] > > > > > > > > I'm not really familiar with this part of the code, but after looking > > at > > > > it, > > > > it seems to me that the deployment of a new version is not atomic. > > > Meaning > > > > that a process could be flagged as retired while the creation of a > new > > > > instance is in progress, hence you're exception. > > > > > > > > does it make sense regarding your scenario? is it possible that the > > > process > > > > gets retired while messages are coming in? > > > > > > > > [1] further details here: > > > > http://ode.apache.org/user-guide.html#UserGuide-Versioning > > > > > > > > > > > > > > > > On Wed, Oct 8, 2008 at 11:37 AM, Chris Taylor <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Okay, I've a deployment (called GetCodes) bundle that includes 5 > > > > > processes. 4 of the processes make calls to the fifth (it's an > > > > abstraction > > > > > layer of process business logic). When I deploy this "GetCodes" > > bundle > > > > > using the DeploymentService utility, I can see an incremented > > > deployment > > > > > (say, GetCodes-40) alongside previous iterations. > > > > > > > > > > Occasionally, I'll have a client making soap calls to one of the > > > > processes > > > > > under this logical bundle that will fail with the following error: > > > > > > > > > > InvalidProcessException: Process is retired. > > > > > > > > > > In the logs, it's clear that ODE is directing this client call to > > > > > GetCodes-39 - though the client isn't explicitly attempting to call > a > > > > > specific version (is that even possible?). Any clue why some > clients > > > > > periodically - erroneously - are directed by ODE to a retired > process > > > > > version? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
