Thanks, Mohammad. The "oozie.service.coord.input.check.requeue.interval" is set the default value. I believe it's 1 min.
For instrumentation log, here is what I got: 2013-09-11 17:25:58,495 INFO oozieinstrumentation:539 - USER[-] GROUP[-] samplers: callablequeue: queue.size: 816.85 threads.active: 1.6666666666666667 Is this the right value to look at? Thanks, Shanzhong On Fri, Sep 13, 2013 at 10:36 AM, Mohammad Islam <[email protected]> wrote: > > > The number doesn't look bad. > > At most, 160 x 7 = 1120 coordinator actions can compete for data > dependency checks. > Is it possible to reduce the throttle number (for example 3)? It should > not create much problem. > > Also did you change other relevant configuration? something like > "oozie.service.coord.input.check.requeue.interval"? > > Also, if possible , can you pls check the queue.size in > oozie-instrumentation.log around the time when this problem happens? This > value is updated every minutes. > > > Regards, > Mohammad > > > ________________________________ > From: Shangzhong zhu <[email protected]> > To: [email protected]; Mohammad Islam <[email protected]> > Sent: Friday, September 13, 2013 9:19 AM > Subject: Re: oozie coordinator action start time delay > > > Hi Mohammad, > > We have about 160 active coordinator jobs in the system, Active workflow > jobs is around 15 at a given time. > > Throttle: 7 > conCur: 1 > > Thanks. > > > On Fri, Sep 13, 2013 at 12:11 AM, Mohammad Islam <[email protected]> > wrote: > > > > > > > Hi Shangzhong, > > How many jobs are active in the system? > > > > What are the values of "throttle" and "concurrency" in your coordinator > > files? > > More details : > > > > > http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a6.1.6._Coordinator_Action_Execution_Policies > > > > > > Regards, > > Mohammad > > > > ________________________________ > > From: Shangzhong zhu <[email protected]> > > To: [email protected] > > Sent: Thursday, September 12, 2013 6:41 PM > > Subject: oozie coordinator action start time delay > > > > > > Hi, we are running Oozie Incubation 3.2.0. > > > > The coordinator job is data trigger based. What we found was, sometimes, > > there seems to be a long delay between the time the trigger file is > ready, > > until the time the coordinator action was started. > > > > For example, > > > > The trigger file was created at 2013-09-11 17:07, > > The coordinator action nominal time is 2013-09-11 15:00 > > The coordinator start time is actually 2013-09-11 17:25 > > > > In another word, the action started 18 minutes after the trigger file is > > ready. > > > > Any idea what is going on here? and how to reduce delay. > > > > BTW, this seems to be a random behavior. Is it related to the loads on > > Oozie server? > > > > Thanks. > > >
