Hello Everyone,
I am trying to run a quite resource demanding process (workflow) in
BPEL. The main part of it consists of 4 threads (within a flow
construct). Each thread behaves like a consumer and producer at the same
time: it retrieves data from one buffer and sends to another. The buffer
is a Web service, and a data that is retrieved from it is from few
hundreds of KB up to few MB. Between the buffers there is some XPath
filtering on the data. Here are the threads:
1) gets data from process request and appends data to S1
2) retrieves data from S1, filters and appends to S2 (or possibly
directly to process response)
3) retrieves data from S2, filters and appends to S3 (or possibly
directly to process response)
4) retrieves data from S3 and appends to process response
Attached you'll find the code for the thread nr 3.
I have deployed the process in ODE 1.3.5 under tomcat 6.0.32. The ODE is
configured to use a Postgres db, the process in not in-memory and I have
disabled event-generation. When I run it, the memory consumption of
tomcat goes to the limit (currently 3GB), and the cpu is used on average
around 100% (machines has 2 CPUs). Monitoring progress of the pipeline
shows that most of the time ODE is processing the output from the
service. And here, after this lengthy introduction come my questions:
Every response from a service contains N complex elements of the same
type (quite complex xml, up to few MB per response). The thread iterates
over the list and retrieves one string from every element. Then it
either appends the string to the process response (ode:insert-as-last)
or sends a part of the whole element to the next service. Could
retrieving, let's say 50 strings from a few MB xml document take so long
and be so resource demanding? Can assign activities be so expensive?
Maybe I use them in a wrong way? I am new to BPEL so I expect that there
may be things done behind the scenes, that I am not aware of.
What about using global variables shared between the threads (process
response)? Could concurrent writes to a shared variable result in locks
that slow things down? I keep getting this exception:
12:20:28,949 ERROR [SimpleScheduler] Error while processing a persisted
job: Job hqejbhcnphr69h4je5me2h time: 2011-05-09 12:20:25 CEST
transacted: true persisted: true details: JobDetails( instanceId:
16767000 mexId: null processId: null type: TIMER channel: 6814
correlatorId: null correlationKeySet: null retryCount: null inMem: false
detailsExt: {})
org.apache.ode.bpel.iapi.Scheduler$JobProcessorException
at
org.apache.ode.bpel.engine.BpelEngineImpl.acquireInstanceLock(BpelEngineImpl.java:394)
at
org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:403)
at
org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:450)
at
org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:518)
at
org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:512)
at
org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:284)
at
org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:239)
at
org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:512)
at
org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:496)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I followed this thread and found out that while one thread waits for a
reply from the service, other try to acquire the lock. If they fail to
get it withing 1 microsecond, they sleep for 5^failed_attempts seconds
and try again. The failed_attempts was getting large and so were the
timeouts, which resulted in basically only one thread running. I changed
the ODE source replacing 1 microsecond with 100 ms, and 5^retries with
5, and this helped a lot. But maybe there is a reason for this exception
that is in my BPEL code, and not in timeouts?
In general, I am looking for ways of optimizing the process. Maybe
saving the data to the database could be limited even more (not just
events)? Or the queries/assignments optimized. Any feedback is welcome.
Thanks in advance,
Pawel