Re: JobManager failing to schedule jobs

Brett Palmer Wed, 13 Jul 2011 19:41:29 -0700

Josh,

I'm attaching the patch I used to work around this issue.  This is based on
an older version of ofbiz so I would compare your current files carefully.


The following files were patched:

service-config.xsd
serviceengine.xml


JobManager.java
JobPoller.java


The patch allowed for a new configuration option

 poll-transaction-timeout="300"

I'm pretty sure that I was using 300 seconds for the
poll-transaction-timeout.  I believe the default is 60 or 120 seconds.

I originally created a JIRA issue 3855 for this problem.

https://issues.apache.org/jira/browse/OFBIZ-3855


If you set the transaction time out too high when the poller wakes up to
process new requests it will timeout because the first poller has a lock on
the table (or ofbiz semaphore method).


Here are a couple of other options you could try since the number of pending
jobs is so high.

1. Create a temporary status for the jobSandbox statusId and assign a large
set of pending transactions to this status.  Then only process a few 1000 at
a time.  Then you can incrementally change these back to pending so the
service engine can process them in reasonable batches.  I haven't tried this
option but it would allow you to work with the service engine without
modifying any code.


2.  Start up several more instances of ofbiz all pointing to the same
database.  Each will start service process to process more requests in
parallel.  This probably won't work with out the patch I've attached as each
service process would still time out and not allow other processes to start.



Good luck,



Brett




On Wed, Jul 13, 2011 at 8:10 PM, Josh Jacobson <[email protected]>wrote:

> Thanks again. I actually meant a suggestion for the transaction
> timeout. In any case I am grateful for your explanation.
>
>
> On Wednesday, July 13, 2011, Scott Gray <[email protected]>
> wrote:
> > As best I can tell there shouldn't be any need to increase the interval
> between polls since the interval timer doesn't actually start until the
> previous poll has completed (see JobPoller.run()) so I can't see how a small
> interval would cause any backlog problems.
> >
> > I'm guessing if there is any lock contention then it's probably caused by
> the executing jobs trying to update their respective rows while the poller
> is holding a table lock.  So from that point of view I guess increasing the
> interval could reduce the amount of contention between the executing jobs
> and the next poll.
> >
> > Regards
> > Scott
> >
> > On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
> >
> >> Scott,
> >>
> >> Thanks! That is very precise advise. Do you have a suggestion on
> >> interval time? 60 seconds? 120?
> >>
> >> Thanks,
> >>
> >> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <[email protected]>
> wrote:
> >>> That configuration is for the frequency of job polls.  There isn't any
> ability to specify the transaction timeout via configuration so you'll need
> to modify the code directly:
> >>> JobManager.java (line 148):
> >>> beganTransaction = TransactionUtil.begin();
> >>> needs to be changed to use TransactionUtil.begin(int)
> >>>
> >>> Regards
> >>> Scott
> >>>
> >>> HotWax Media
> >>> http://www.hotwaxmedia.com
> >>>
> >>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
> >>>
> >>>> Brett,
> >>>>
> >>>> Before I start trying to run the jobs manually, I want to give your
> >>>> suggestion a try. I think I know where to configure the job polling
> >>>> transaction time (I believe it's the poll-db-millis="20000" value on
> >>>> the framework/service/config/serviceengine.xml.
> >>>>
> >>>> However, I still don't know what to increase it to. I understand that
> >>>> we wouldn't want to make it bigger than the default polling interval.
> >>>> Do you know what the default interval between polling is?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <
> [email protected]> wrote:
> >>>>> I meant removing finished jobs.  If you have thousands of pending
> jobs then
> >>>>> you will have the same problem I mentioned in my first email.  One
> >>>>> resolution will be to increase the job poller transaction time.  In
> the
> >>>>> ofbiz version I was using there was not a way to configure the poller
> >>>>> transaction time.  It just used the default time.  I had to create a
> patch
> >>>>> to allow this to happen.
> >>>>>
> >>>>> In the patch you had to be careful to not increase the transaction
> time
> >>>>> greater than the frequency of the job poller.  Otherwise you get into
> a lock
> >>>>> situation where one job poller is still running within a transaction
> and
> >>>>> another poller starts.  This didn't create a huge problem but the
> second job
> >>>>> poller would usually lock and then time out.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Brett
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <
> [email protected]>wrote:
> >>>>>
> >>>>>> Brett,
> >>>>>>
> >>>>>> Can you please explain what you mean by archiving the current
> JobSandbox
> >>>>>> first?
> >>>>>> Do you mean somehow removing the current pending jobs, applying you
> >>>>>> patch and the copying them back again?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jul 13, 2011 at 12:08 PM, Brett Palmer <
> [email protected]>
> >>>>>> wrote:
> >>>>>>> Josh,
> >>>>>>>
> >>>>>>> I've also seen this problem if the JobSandbox table has too many
> rows to
> >>>>>>> process.  I ran into a similar problem when I tried to run 10,000
> Async
> >>>>>>> batch processes.  The time it took for the JobPoller to process all
> the
> >>>>>>> records was too long and the transaction would time out.
> >>>>>>>
> >>>>>>> I had a patch to change the transaction timeout for the JobPoller
> >>>>>>> specifically as it wasn't available in ofbiz at the time, but I
> don't
> >>>>>> think
> >>>>>>> I ever submitted it.  I could look for this patch if anyone is
> interested
> >>>>>>> but it may already be implemented in the framework.
> >>>>>>>
> >>>>>>> I
>

ofbiz_jobpoll_tx_deploy_patch.jar
Description: application/java-archive

Re: JobManager failing to schedule jobs

Reply via email to