Hello Everyone,

I am trying to run a quite resource demanding process (workflow) in BPEL. The main part of it consists of 4 threads (within a flow construct). Each thread behaves like a consumer and producer at the same time: it retrieves data from one buffer and sends to another. The buffer is a Web service, and a data that is retrieved from it is from few hundreds of KB up to few MB. Between the buffers there is some XPath filtering on the data. Here are the threads:
1) gets data from process request and appends data to S1
2) retrieves data from S1, filters and appends to S2 (or possibly directly to process response) 3) retrieves data from S2, filters and appends to S3 (or possibly directly to process response)
4) retrieves data from S3 and appends to process response

Attached you'll find the code for the thread nr 3.

I have deployed the process in ODE 1.3.5 under tomcat 6.0.32. The ODE is configured to use a Postgres db, the process in not in-memory and I have disabled event-generation. When I run it, the memory consumption of tomcat goes to the limit (currently 3GB), and the cpu is used on average around 100% (machines has 2 CPUs). Monitoring progress of the pipeline shows that most of the time ODE is processing the output from the service. And here, after this lengthy introduction come my questions:

Every response from a service contains N complex elements of the same type (quite complex xml, up to few MB per response). The thread iterates over the list and retrieves one string from every element. Then it either appends the string to the process response (ode:insert-as-last) or sends a part of the whole element to the next service. Could retrieving, let's say 50 strings from a few MB xml document take so long and be so resource demanding? Can assign activities be so expensive? Maybe I use them in a wrong way? I am new to BPEL so I expect that there may be things done behind the scenes, that I am not aware of.

What about using global variables shared between the threads (process response)? Could concurrent writes to a shared variable result in locks that slow things down? I keep getting this exception:

12:20:28,949 ERROR [SimpleScheduler] Error while processing a persisted job: Job hqejbhcnphr69h4je5me2h time: 2011-05-09 12:20:25 CEST transacted: true persisted: true details: JobDetails( instanceId: 16767000 mexId: null processId: null type: TIMER channel: 6814 correlatorId: null correlationKeySet: null retryCount: null inMem: false detailsExt: {})
org.apache.ode.bpel.iapi.Scheduler$JobProcessorException
at org.apache.ode.bpel.engine.BpelEngineImpl.acquireInstanceLock(BpelEngineImpl.java:394) at org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:403) at org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:450) at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:518) at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:512) at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:284) at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:239) at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:512) at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:496) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)

I followed this thread and found out that while one thread waits for a reply from the service, other try to acquire the lock. If they fail to get it withing 1 microsecond, they sleep for 5^failed_attempts seconds and try again. The failed_attempts was getting large and so were the timeouts, which resulted in basically only one thread running. I changed the ODE source replacing 1 microsecond with 100 ms, and 5^retries with 5, and this helped a lot. But maybe there is a reason for this exception that is in my BPEL code, and not in timeouts?

In general, I am looking for ways of optimizing the process. Maybe saving the data to the database could be limited even more (not just events)? Or the queries/assignments optimized. Any feedback is welcome.

Thanks in advance,
Pawel

Reply via email to