Hi all, We are observing some strange behaviors in Oozie running workflows under YARN. Jobs are being launched properly from Oozie, but the workflow would go into SUSPENDED state with the running action in START_MANUAL state after about 20 minutes. The only error message I can find is from the Oozie UI Action Info dialog box and as follows:
Status: START_MANUL Error Code: JA009 Error Message: JA009: Unknown rpc kind RPC_WRITABLE We ran into this error when we were configuring Oozie to work with YARN, and the cause was that Oozie was using the old clients to talk to YARN RM. That was fixed by setting the correct CATALINA_BASE in oozie-env.sh. We suspect that somehow Oozie is still using the old client to check the status of a running job, but we couldn't figure out which configuration is causing this to happen. Just to add some additional information regarding this issue. The workflow only gets suspended when it runs over a certain time limit. Our observation is about 20 minutes. Any workflow that completes under that time limit doesn't have this issue. Have anyone run into this issue before ? Any pointers to how to debug this issue is very much appreciated ! Cheers, Jialong
