xml processing performance

Pawel Sztromwasser Mon, 09 May 2011 05:28:50 -0700

Hello Everyone,

I am trying to run a quite resource demanding process (workflow) inBPEL. The main part of it consists of 4 threads (within a flowconstruct). Each thread behaves like a consumer and producer at the sametime: it retrieves data from one buffer and sends to another. The bufferis a Web service, and a data that is retrieved from it is from fewhundreds of KB up to few MB. Between the buffers there is some XPathfiltering on the data. Here are the threads:

1) gets data from process request and appends data to S1

2) retrieves data from S1, filters and appends to S2 (or possiblydirectly to process response)3) retrieves data from S2, filters and appends to S3 (or possiblydirectly to process response)

4) retrieves data from S3 and appends to process response


Attached you'll find the code for the thread nr 3.

I have deployed the process in ODE 1.3.5 under tomcat 6.0.32. The ODE isconfigured to use a Postgres db, the process in not in-memory and I havedisabled event-generation. When I run it, the memory consumption oftomcat goes to the limit (currently 3GB), and the cpu is used on averagearound 100% (machines has 2 CPUs). Monitoring progress of the pipelineshows that most of the time ODE is processing the output from theservice. And here, after this lengthy introduction come my questions:

Every response from a service contains N complex elements of the sametype (quite complex xml, up to few MB per response). The thread iteratesover the list and retrieves one string from every element. Then iteither appends the string to the process response (ode:insert-as-last)or sends a part of the whole element to the next service. Couldretrieving, let's say 50 strings from a few MB xml document take so longand be so resource demanding? Can assign activities be so expensive?Maybe I use them in a wrong way? I am new to BPEL so I expect that theremay be things done behind the scenes, that I am not aware of.

What about using global variables shared between the threads (processresponse)? Could concurrent writes to a shared variable result in locksthat slow things down? I keep getting this exception:

12:20:28,949 ERROR [SimpleScheduler] Error while processing a persistedjob: Job hqejbhcnphr69h4je5me2h time: 2011-05-09 12:20:25 CESTtransacted: true persisted: true details: JobDetails( instanceId:16767000 mexId: null processId: null type: TIMER channel: 6814correlatorId: null correlationKeySet: null retryCount: null inMem: falsedetailsExt: {})

org.apache.ode.bpel.iapi.Scheduler$JobProcessorException

atorg.apache.ode.bpel.engine.BpelEngineImpl.acquireInstanceLock(BpelEngineImpl.java:394)atorg.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:403)atorg.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:450)atorg.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:518)atorg.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:512)atorg.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:284)atorg.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:239)atorg.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:512)atorg.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:496)atjava.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

       at java.util.concurrent.FutureTask.run(FutureTask.java:138)

atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

       at java.lang.Thread.run(Thread.java:662)

I followed this thread and found out that while one thread waits for areply from the service, other try to acquire the lock. If they fail toget it withing 1 microsecond, they sleep for 5^failed_attempts secondsand try again. The failed_attempts was getting large and so were thetimeouts, which resulted in basically only one thread running. I changedthe ODE source replacing 1 microsecond with 100 ms, and 5^retries with5, and this helped a lot. But maybe there is a reason for this exceptionthat is in my BPEL code, and not in timeouts?

In general, I am looking for ways of optimizing the process. Maybesaving the data to the database could be limited even more (not justevents)? Or the queries/assignments optimized. Any feedback is welcome.


Thanks in advance,
Pawel

xml processing performance

Reply via email to