I should add that the environment also has a lot to o with this. In this area I have changed to Solid State Drives for Storage and 32gb SDHC for Swap files.
BJ Freeman sent the following on 7/14/2011 8:09 AM: > I find that anything not time based does not work when, like you said > the numbers get large. > I added the createtime to the conditions currently set in the milliseconds. > > Brett Palmer sent the following on 7/14/2011 5:35 AM: >> One feature that would help to prevent this problem in the future is a >> configuration parameter in the service engine that would set the maximum >> number of jobs the poller would process at a time. Right now the poller >> reads the JobSandbox and gets every job that has a status of Pending. Then >> it tries to change the status for each of these to running (or something >> like that). If the number of pending jobs is too large the poller will time >> out before it can change the state of all the pending jobs. Changing the >> transaction timeout can help this problem but having another configuration >> like "max-poll-jobs" could limit the number of pending jobs that are >> processed in one transaction. There is a configuration called "jobs" but I >> don't think that is used by the polling process. >> >> I've tried to use the service engine as an asynchronous batch server but run >> into problems when the number of pending jobs gets around 10,000. >> >> >> Brett >> >> On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <bjf...@free-man.net> wrote: >> >>> you going to run into this from time to time or one reason or another. >>> the approach I took was to spread the jobs out so they are not lumped >>> together. >>> take a look at how the jobs are Marshalled to be run. >>> >>> Josh Jacobson sent the following on 7/13/2011 8:35 PM: >>>> Vacuum has been run, (took quite a while). Yeah, I see now that the >>>> JobManager actually tries to update all the JobSandbox rows in the >>>> transaction, so 60 seconds was pretty low. >>>> >>>> I am trying 10 minutes now and see how that goes. >>>> >>>> I am using postgress by the way. >>>> >>>> Thanks for the help, I really appreciate it. >>>> >>>> -- >>>> Josh. >>>> >>>> On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <scott.g...@hotwaxmedia.com> >>> wrote: >>>>> Not sure what db you're using but it probably wouldn't hurt to run a >>> vacuum on the table to speed up processing. >>>>> >>>>> By the way, I'm pretty sure the default timeout is 60 seconds so you >>> might want to try something a little larger :-) >>>>> >>>>> Regards >>>>> Scott >>>>> >>>>> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: >>>>> >>>>>> I tried 60 seconds for timeout but that didn't work. I guess Ill >>>>>> double it now and keep trying. >>>>>> >>>>>> I have about 260,000 pending jobs, and nothing is getting done. >>>>>> >>>>>> I know what you mean about purgeOldjobs. That service is crashed now >>>>>> and I deleted old jobs from the database by hand. I was up to 2.6 >>>>>> million rows. Ofbiz was pretty much unusable. >>>>>> >>>>>> If you have any other suggestions I'd love Yo hear them. >>>>>> >>>>>> On Wednesday, July 13, 2011, Scott Gray <scott.g...@hotwaxmedia.com> >>> wrote: >>>>>>> Ah okay, that is entirely dependent on the number of jobs and the >>> speed the server can process them. As a side note I would keep a close eye >>> on the purgeOldJobs service, when it starts falling over (transaction >>> timeout again) then the number of rows in the table will increase quickly >>> which in turn will slow down polling. >>>>>>> >>>>>>> In general the whole persisted jobs implementation is a bit fragile, >>> especially when dealing with a large number of jobs. I've wanted to replace >>> it with something like quartz for a while but haven't had the time. >>>>>>> >>>>>>> Regards >>>>>>> Scott >>>>>>> >>>>>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: >>>>>>> >>>>>>>> Thanks again. I actually meant a suggestion for the transaction >>>>>>>> timeout. In any case I am grateful for your explanation. >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, July 13, 2011, Scott Gray <scott.g...@hotwaxmedia.com> >>> wrote: >>>>>>>>> As best I can tell there shouldn't be any need to increase the >>> interval between polls since the interval timer doesn't actually start until >>> the previous poll has completed (see JobPoller.run()) so I can't see how a >>> small interval would cause any backlog problems. >>>>>>>>> >>>>>>>>> I'm guessing if there is any lock contention then it's probably >>> caused by the executing jobs trying to update their respective rows while >>> the poller is holding a table lock. So from that point of view I guess >>> increasing the interval could reduce the amount of contention between the >>> executing jobs and the next poll. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Scott >>>>>>>>> >>>>>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>>>>>>>> >>>>>>>>>> Scott, >>>>>>>>>> >>>>>>>>>> Thanks! That is very precise advise. Do you have a suggestion on >>>>>>>>>> interval time? 60 seconds? 120? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray < >>> scott.g...@hotwaxmedia.com> wrote: >>>>>>>>>>> That configuration is for the frequency of job polls. There isn't >>> any ability to specify the transaction timeout via configuration so you'll >>> need to modify the code directly: >>>>>>>>>>> JobManager.java (line 148): >>>>>>>>>>> beganTransaction = TransactionUtil.begin(); >>>>>>>>>>> needs to be changed to use TransactionUtil.begin(int) >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Scott >>>>>>>>>>> >>>>>>>>>>> HotWax Media >>>>>>>>>>> http://www.hotwaxmedia.com >>>>>>>>>>> >>>>>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>>>>>>>> >>>>>>>>>>>> Brett, >>>>>>>>>>>> >>>>>>>>>>>> Before I start trying to run the jobs manually, I want to give >>> your >>>>>>>>>>>> suggestion a try. I think I know where to configure the job >>> polling >>>>>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value >>> on >>>>>>>>>>>> the framework/service/config/serviceengine.xml. >>>>>>>>>>>> >>>>>>>>>>>> However, I still don't know what to increase it to. I understand >>> that >>>>>>>>>>>> we wouldn't want to make it bigger than the default polling >>> interval. >>>>>>>>>>>> Do you know what the default interval between polling is? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer < >>> brettgpal...@gmail.com> wrote: >>>>>>>>>>>>> I meant removing finished jobs. If you have thousands of >>> pending jobs then >>>>>>>>>>>>> you will have the same problem I mentioned in my first email. >>> One >>>>>>>>>>>>> resolution will be to increase the job poller transaction time. >>> In the >>>>>>>>>>>>> ofbiz version I was using there was not a way to configure the >>> poller >>>>>>>>>>>>> transaction time. It just used the default time. I had to >>> create a patch >>>>>>>>>>>>> to allow this to happen. >>>>>>>>>>>>> >>>>>>>>>>>>> In the patch you had to be careful to not increase the >>> transaction time >>>>>>>>>>>>> greater than the frequency of the job poller. Otherwise you get >>> into a lock >>>>>>>>>>>>> situation where one job poller is still running within a >>> transaction and >>>>>>>>>>>>> another poller starts. This didn't create a huge problem but >>> the second job >>>>>>>>>>>>> poller would usually lock and then time out. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Brett >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson < >>> josh.s.jacob...@gmail.com>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Brett, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>>>> >>>> >>> >> >