Hi Dongying, this seems like a bug in ZKJobsConcurrencyService - in case numOozies is zero isJobIdForThisServer() should emit a WARN log stating that the other Oozie instance might be missing and return true rather than throwing a RuntimeException.
Can you please file a bug under Apache JIRA. Thanks, and regards, Andras -- Andras PIROS Software Engineer <http://www.cloudera.com/> On Tue, Dec 6, 2016 at 4:33 AM, Dongying Jiao <pineapple...@gmail.com> wrote: > Hi: > Do you have the detail steps on setting up oozie HA using virtual IP? > I setup oozie HA using virtual IP, server-1 and server-2(active-active), > when we take down server-1 any oozie job submitted fails with below > stacktrace. If both are up , there is no issue. > ERROR RecoveryService$RecoveryRunnable:517 - SERVER[XXXX] USER[-] GROUP[-] > TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, / by zero > java.lang.ArithmeticException: / by zero > at > org.apache.oozie.service.ZKJobsConcurrencyService.checkJobIdForServer( > ZKJobsConcurrencyService.java:167) > at > org.apache.oozie.service.ZKJobsConcurrencyService.isJobIdForThisServer( > ZKJobsConcurrencyService.java:129) > at > org.apache.oozie.service.RecoveryService$RecoveryRunnable. > runWFRecovery(RecoveryService.java:362) > at > org.apache.oozie.service.RecoveryService$RecoveryRunnable.run( > RecoveryService.java:146) > at > org.apache.oozie.service.SchedulerService$2.run(SchedulerService.java:175) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > It seems server-2 can't get oozie server list from zookeeper. Zookeeper > connection string is already added to oozie site. > > Thanks > > Best Regards, > Dongying Jiao >