1. Other reason beside what Robert suggested is because of
StatusTransitService. StatusTransitService will mark bundle as suspended if all
coord jobs are suspended. There was bug in StatusTransitService which doesn't
resume bundle if coord resumes, we recently fixed that (OOZIE-2228).Bundle
suspend will not cause coord suspend unless there is suspend command issues.
You should looks at DB column as Robert suggested and also Oozie logs. Oozie
logs might give you information why coord/bundle are suspended.
From: Robert Kanter <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Tuesday, August 18, 2015 11:04 AM
Subject: Re: Unpaused coordinator catchup
Hi Oren,
1. The only time I've seen this sort of behavior is when the default value
for a column in the database was wrong. Have you changed anything in the
database? IIRC the default value for most columns should be NULL.
2. It sounds like you might want either the LAST_ONLY or NONE execution
policy for your Coordinator(s). This section of the docs explains what
each of them does:
http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a6.3._Synchronous_Coordinator_Application_Definition
- Robert
On Tue, Aug 18, 2015 at 7:19 AM, Oren Mazor <[email protected]> wrote:
> I'm working a sizeable oozie (CDH oozie 4.1.0) deployment, where workflows
> and coordinators are constantly being updated. I have two problems though:
>
> 1. I am not sure why, but my bundles are constantly in the SUSPENDED state.
> sometimes this results in all of the coordinators within them being marked
> as suspended, and sometimes the coordinators remain running. Is there any
> way a bundle could become suspended on its own somehow?
>
> 2. When I resume a suspended coordinator, it immediately starts submitting
> all of the "missed" actions, but I actually just want it to do this once,
> or wait until the next materialization time. I realize this is explicit
> behaviour in the docs. Is there a flag to disable this on the service
> itself?
>
> thanks!
> Oren
>