BTW, the coordinator job is running every 15 minutes. and the workflow action is a JAVA action.
Looks like the Java action is trying to find the action data sequence file, but somehow cannot read. Any insight into this issue? Thanks, Shanzhong On Thu, Jun 30, 2016 at 6:47 PM, Shangzhong zhu <[email protected]> wrote: > We noticed once a while, some oozie workflow actions (part of coordinator > jobs) took a long time to transition. > > We noticed the following exceptions from Oozie logs (oozie version: 4.1.0): > > 2016-06-30 07:51:12,830 WARN ActionCheckXCommand:544 - SERVER[ > node75-144.prod-aws.eadpdata.ea.com] USER[hadoop] GROUP[-] TOKEN[] > APP[PIN-Translation] JOB[0042217-160627222917756-oozie-oozi-W] > ACTION[0042217-160627222917756-oozie-oozi-W@pin-translation_wf] Exception > while executing check(). Error Code [JA009], Message[JA009: null] > > org.apache.oozie.action.ActionExecutorException: JA009: null > > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:418) > > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:396) > > at > org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1296) > > at > org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:181) > > at > org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:55) > > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: *java.io.EOFException* > > at java.io.DataInputStream.readFully(DataInputStream.java:197) > > at java.io.DataInputStream.readFully(DataInputStream.java:169) > > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1848) > > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1813) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1762) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1776) > > at > org.apache.oozie.action.hadoop.LauncherMapperHelper$1.run(LauncherMapperHelper.java:270) > > at > org.apache.oozie.action.hadoop.LauncherMapperHelper$1.run(LauncherMapperHelper.java:264) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > > at > org.apache.oozie.action.hadoop.LauncherMapperHelper.getActionData(LauncherMapperHelper.java:264) > > at > org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1207) > > ... 7 more > > It caused retry happened: > > 2016-06-30 07:51:12,836 INFO ActionCheckXCommand:541 - SERVER[ > node75-144.prod-aws.eadpdata.ea.com] USER[hadoop] GROUP[-] TOKEN[] > APP[PIN-Translation] JOB[*0042217-160627222917756-oozie-oozi-W*] ACTION[ > *0042217-160627222917756-oozie-oozi-W*@pin-translation_wf] Next Retry, > Attempt Number [1] in [60,000] milliseconds > > And at the end, the retry maxed out, the workflow action got suspended. > > 2016-06-30 07:54:13,209 WARN ActionCheckXCommand:544 - SERVER[ > node75-144.prod-aws.eadpdata.ea.com] USER[hadoop] GROUP[-] TOKEN[] > APP[PIN-Translation] JOB[*0042217-160627222917756-oozie-oozi-W*] ACTION[ > *0042217-160627222917756-oozie-oozi-W*@pin-translation_wf] Suspending > Workflow Job id=*0042217-160627222917756-oozie-oozi-W* > > > Any idea what is going wrong here? > > Thanks, > > Shanzhong >
