Re: JobManager failing to schedule jobs

BJ Freeman Thu, 14 Jul 2011 08:09:57 -0700

I find that anything not time based does not work when, like you said
the numbers get large.
I added the createtime to the conditions currently set in the milliseconds.


Brett Palmer sent the following on 7/14/2011 5:35 AM:
> One feature that would help to prevent this problem in the future is a
> configuration parameter in the service engine that would set the maximum
> number of jobs the poller would process at a time.  Right now the poller
> reads the JobSandbox and gets every job that has a status of Pending.  Then
> it tries to change the status for each of these to running (or something
> like that).  If the number of pending jobs is too large the poller will time
> out before it can change the state of all the pending jobs.  Changing the
> transaction timeout can help this problem but having another configuration
> like "max-poll-jobs" could limit the number of pending jobs that are
> processed in one transaction.  There is a configuration called "jobs" but I
> don't think that is used by the polling process.
> 
> I've tried to use the service engine as an asynchronous batch server but run
> into problems when the number of pending jobs gets around 10,000.
> 
> 
> Brett
> 
> On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <[email protected]> wrote:
> 
>> you going to run into this from time to time or one reason or another.
>> the approach I took was to spread the jobs out so they are not lumped
>> together.
>> take a look at how the jobs are Marshalled to be run.
>>
>> Josh Jacobson sent the following on 7/13/2011 8:35 PM:
>>> Vacuum has been run, (took quite a while). Yeah, I see now that the
>>> JobManager actually tries to update all the JobSandbox rows in the
>>> transaction, so 60 seconds was pretty low.
>>>
>>> I am trying 10 minutes now and see how that goes.
>>>
>>> I am using postgress by the way.
>>>
>>> Thanks for the help, I really appreciate it.
>>>
>>> --
>>> Josh.
>>>
>>> On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <[email protected]>
>> wrote:
>>>> Not sure what db you're using but it probably wouldn't hurt to run a
>> vacuum on the table to speed up processing.
>>>>
>>>> By the way, I'm pretty sure the default timeout is 60 seconds so you
>> might want to try something a little larger :-)
>>>>
>>>> Regards
>>>> Scott
>>>>
>>>> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote:
>>>>
>>>>> I tried 60 seconds for timeout but that didn't work. I guess Ill
>>>>> double it now and keep trying.
>>>>>
>>>>> I have about 260,000 pending jobs, and nothing is getting done.
>>>>>
>>>>> I know what you mean about purgeOldjobs. That service is crashed now
>>>>> and I deleted old jobs from the database by hand. I was up to 2.6
>>>>> million rows. Ofbiz was pretty much unusable.
>>>>>
>>>>> If you have any other suggestions I'd love Yo hear them.
>>>>>
>>>>> On Wednesday, July 13, 2011, Scott Gray <[email protected]>
>> wrote:
>>>>>> Ah okay, that is entirely dependent on the number of jobs and the
>> speed the server can process them.  As a side note I would keep a close eye
>> on the purgeOldJobs service, when it starts falling over (transaction
>> timeout again) then the number of rows in the table will increase quickly
>> which in turn will slow down polling.
>>>>>>
>>>>>> In general the whole persisted jobs implementation is a bit fragile,
>> especially when dealing with a large number of jobs.  I've wanted to replace
>> it with something like quartz for a while but haven't had the time.
>>>>>>
>>>>>> Regards
>>>>>> Scott
>>>>>>
>>>>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:
>>>>>>
>>>>>>> Thanks again. I actually meant a suggestion for the transaction
>>>>>>> timeout. In any case I am grateful for your explanation.
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, July 13, 2011, Scott Gray <[email protected]>
>> wrote:
>>>>>>>> As best I can tell there shouldn't be any need to increase the
>> interval between polls since the interval timer doesn't actually start until
>> the previous poll has completed (see JobPoller.run()) so I can't see how a
>> small interval would cause any backlog problems.
>>>>>>>>
>>>>>>>> I'm guessing if there is any lock contention then it's probably
>> caused by the executing jobs trying to update their respective rows while
>> the poller is holding a table lock.  So from that point of view I guess
>> increasing the interval could reduce the amount of contention between the
>> executing jobs and the next poll.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Scott
>>>>>>>>
>>>>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
>>>>>>>>
>>>>>>>>> Scott,
>>>>>>>>>
>>>>>>>>> Thanks! That is very precise advise. Do you have a suggestion on
>>>>>>>>> interval time? 60 seconds? 120?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <
>> [email protected]> wrote:
>>>>>>>>>> That configuration is for the frequency of job polls.  There isn't
>> any ability to specify the transaction timeout via configuration so you'll
>> need to modify the code directly:
>>>>>>>>>> JobManager.java (line 148):
>>>>>>>>>> beganTransaction = TransactionUtil.begin();
>>>>>>>>>> needs to be changed to use TransactionUtil.begin(int)
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Scott
>>>>>>>>>>
>>>>>>>>>> HotWax Media
>>>>>>>>>> http://www.hotwaxmedia.com
>>>>>>>>>>
>>>>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
>>>>>>>>>>
>>>>>>>>>>> Brett,
>>>>>>>>>>>
>>>>>>>>>>> Before I start trying to run the jobs manually, I want to give
>> your
>>>>>>>>>>> suggestion a try. I think I know where to configure the job
>> polling
>>>>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value
>> on
>>>>>>>>>>> the framework/service/config/serviceengine.xml.
>>>>>>>>>>>
>>>>>>>>>>> However, I still don't know what to increase it to. I understand
>> that
>>>>>>>>>>> we wouldn't want to make it bigger than the default polling
>> interval.
>>>>>>>>>>> Do you know what the default interval between polling is?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <
>> [email protected]> wrote:
>>>>>>>>>>>> I meant removing finished jobs.  If you have thousands of
>> pending jobs then
>>>>>>>>>>>> you will have the same problem I mentioned in my first email.
>>  One
>>>>>>>>>>>> resolution will be to increase the job poller transaction time.
>>  In the
>>>>>>>>>>>> ofbiz version I was using there was not a way to configure the
>> poller
>>>>>>>>>>>> transaction time.  It just used the default time.  I had to
>> create a patch
>>>>>>>>>>>> to allow this to happen.
>>>>>>>>>>>>
>>>>>>>>>>>> In the patch you had to be careful to not increase the
>> transaction time
>>>>>>>>>>>> greater than the frequency of the job poller.  Otherwise you get
>> into a lock
>>>>>>>>>>>> situation where one job poller is still running within a
>> transaction and
>>>>>>>>>>>> another poller starts.  This didn't create a huge problem but
>> the second job
>>>>>>>>>>>> poller would usually lock and then time out.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Brett
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <
>> [email protected]>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Brett,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>
>>>>
>>>
>>
>

Re: JobManager failing to schedule jobs

Reply via email to