Re: JobManager failing to schedule jobs

BJ Freeman Thu, 14 Jul 2011 08:36:24 -0700

I should add that the environment also has a lot to o with this.
In this area I have changed to Solid State Drives for Storage and 32gb
SDHC  for Swap files.



BJ Freeman sent the following on 7/14/2011 8:09 AM:
> I find that anything not time based does not work when, like you said
> the numbers get large.
> I added the createtime to the conditions currently set in the milliseconds.
> 
> Brett Palmer sent the following on 7/14/2011 5:35 AM:
>> One feature that would help to prevent this problem in the future is a
>> configuration parameter in the service engine that would set the maximum
>> number of jobs the poller would process at a time.  Right now the poller
>> reads the JobSandbox and gets every job that has a status of Pending.  Then
>> it tries to change the status for each of these to running (or something
>> like that).  If the number of pending jobs is too large the poller will time
>> out before it can change the state of all the pending jobs.  Changing the
>> transaction timeout can help this problem but having another configuration
>> like "max-poll-jobs" could limit the number of pending jobs that are
>> processed in one transaction.  There is a configuration called "jobs" but I
>> don't think that is used by the polling process.
>>
>> I've tried to use the service engine as an asynchronous batch server but run
>> into problems when the number of pending jobs gets around 10,000.
>>
>>
>> Brett
>>
>> On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <bjf...@free-man.net> wrote:
>>
>>> you going to run into this from time to time or one reason or another.
>>> the approach I took was to spread the jobs out so they are not lumped
>>> together.
>>> take a look at how the jobs are Marshalled to be run.
>>>
>>> Josh Jacobson sent the following on 7/13/2011 8:35 PM:
>>>> Vacuum has been run, (took quite a while). Yeah, I see now that the
>>>> JobManager actually tries to update all the JobSandbox rows in the
>>>> transaction, so 60 seconds was pretty low.
>>>>
>>>> I am trying 10 minutes now and see how that goes.
>>>>
>>>> I am using postgress by the way.
>>>>
>>>> Thanks for the help, I really appreciate it.
>>>>
>>>> --
>>>> Josh.
>>>>
>>>> On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <scott.g...@hotwaxmedia.com>
>>> wrote:
>>>>> Not sure what db you're using but it probably wouldn't hurt to run a
>>> vacuum on the table to speed up processing.
>>>>>
>>>>> By the way, I'm pretty sure the default timeout is 60 seconds so you
>>> might want to try something a little larger :-)
>>>>>
>>>>> Regards
>>>>> Scott
>>>>>
>>>>> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote:
>>>>>
>>>>>> I tried 60 seconds for timeout but that didn't work. I guess Ill
>>>>>> double it now and keep trying.
>>>>>>
>>>>>> I have about 260,000 pending jobs, and nothing is getting done.
>>>>>>
>>>>>> I know what you mean about purgeOldjobs. That service is crashed now
>>>>>> and I deleted old jobs from the database by hand. I was up to 2.6
>>>>>> million rows. Ofbiz was pretty much unusable.
>>>>>>
>>>>>> If you have any other suggestions I'd love Yo hear them.
>>>>>>
>>>>>> On Wednesday, July 13, 2011, Scott Gray <scott.g...@hotwaxmedia.com>
>>> wrote:
>>>>>>> Ah okay, that is entirely dependent on the number of jobs and the
>>> speed the server can process them.  As a side note I would keep a close eye
>>> on the purgeOldJobs service, when it starts falling over (transaction
>>> timeout again) then the number of rows in the table will increase quickly
>>> which in turn will slow down polling.
>>>>>>>
>>>>>>> In general the whole persisted jobs implementation is a bit fragile,
>>> especially when dealing with a large number of jobs.  I've wanted to replace
>>> it with something like quartz for a while but haven't had the time.
>>>>>>>
>>>>>>> Regards
>>>>>>> Scott
>>>>>>>
>>>>>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote:
>>>>>>>
>>>>>>>> Thanks again. I actually meant a suggestion for the transaction
>>>>>>>> timeout. In any case I am grateful for your explanation.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wednesday, July 13, 2011, Scott Gray <scott.g...@hotwaxmedia.com>
>>> wrote:
>>>>>>>>> As best I can tell there shouldn't be any need to increase the
>>> interval between polls since the interval timer doesn't actually start until
>>> the previous poll has completed (see JobPoller.run()) so I can't see how a
>>> small interval would cause any backlog problems.
>>>>>>>>>
>>>>>>>>> I'm guessing if there is any lock contention then it's probably
>>> caused by the executing jobs trying to update their respective rows while
>>> the poller is holding a table lock.  So from that point of view I guess
>>> increasing the interval could reduce the amount of contention between the
>>> executing jobs and the next poll.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Scott
>>>>>>>>>
>>>>>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote:
>>>>>>>>>
>>>>>>>>>> Scott,
>>>>>>>>>>
>>>>>>>>>> Thanks! That is very precise advise. Do you have a suggestion on
>>>>>>>>>> interval time? 60 seconds? 120?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <
>>> scott.g...@hotwaxmedia.com> wrote:
>>>>>>>>>>> That configuration is for the frequency of job polls.  There isn't
>>> any ability to specify the transaction timeout via configuration so you'll
>>> need to modify the code directly:
>>>>>>>>>>> JobManager.java (line 148):
>>>>>>>>>>> beganTransaction = TransactionUtil.begin();
>>>>>>>>>>> needs to be changed to use TransactionUtil.begin(int)
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Scott
>>>>>>>>>>>
>>>>>>>>>>> HotWax Media
>>>>>>>>>>> http://www.hotwaxmedia.com
>>>>>>>>>>>
>>>>>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Brett,
>>>>>>>>>>>>
>>>>>>>>>>>> Before I start trying to run the jobs manually, I want to give
>>> your
>>>>>>>>>>>> suggestion a try. I think I know where to configure the job
>>> polling
>>>>>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value
>>> on
>>>>>>>>>>>> the framework/service/config/serviceengine.xml.
>>>>>>>>>>>>
>>>>>>>>>>>> However, I still don't know what to increase it to. I understand
>>> that
>>>>>>>>>>>> we wouldn't want to make it bigger than the default polling
>>> interval.
>>>>>>>>>>>> Do you know what the default interval between polling is?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <
>>> brettgpal...@gmail.com> wrote:
>>>>>>>>>>>>> I meant removing finished jobs.  If you have thousands of
>>> pending jobs then
>>>>>>>>>>>>> you will have the same problem I mentioned in my first email.
>>>  One
>>>>>>>>>>>>> resolution will be to increase the job poller transaction time.
>>>  In the
>>>>>>>>>>>>> ofbiz version I was using there was not a way to configure the
>>> poller
>>>>>>>>>>>>> transaction time.  It just used the default time.  I had to
>>> create a patch
>>>>>>>>>>>>> to allow this to happen.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the patch you had to be careful to not increase the
>>> transaction time
>>>>>>>>>>>>> greater than the frequency of the job poller.  Otherwise you get
>>> into a lock
>>>>>>>>>>>>> situation where one job poller is still running within a
>>> transaction and
>>>>>>>>>>>>> another poller starts.  This didn't create a huge problem but
>>> the second job
>>>>>>>>>>>>> poller would usually lock and then time out.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Brett
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <
>>> josh.s.jacob...@gmail.com>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Brett,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: JobManager failing to schedule jobs

Reply via email to