Hi, oozier:
Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie 4.3.
We found out one nice feature was broken for us on Oozie 5.0.0, unfortunately.
On Oozie 4.3, we put our oozie applications in one S3 bucket, as our release
repository, and in the oozie application properties file, we just use as
following:
appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
And oozie 4.3 runtime will load all the application code from the S3, and still
use the oozie sharelib from the HDFS for us, and whole application workflow
works perfectly.
After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our
application repository anymore. The same application will WORK fine if the
application is stored in HDFS. But if stored in S3, we got the following error
message:
Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not create
lib paths list for application
[s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
s3://bucket-name
at
org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
at
org.apache.oozie.command.wf.SubmitXCommand.execute(SubmitXCommand.java:168)
... 36 more
Caused by: java.lang.IllegalArgumentException: Wrong FS:
hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
s3://bucket-name
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
at
com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
at
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
at
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
at
org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
at
org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
... 37 more
It looks like if we config the APP path as in S3 by
appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain that
it cannot load the sharelib any more from the HDFS URI, even though the all the
share lib are indeed stored in the HFDS correct location as specified in the
error message.
With this error message, I found out the following commit in the Oozie 5.0
https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
Since the error comes from the FileSystem in
core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>,
so I think MAYBE above commit causing it?
[https://avatars3.githubusercontent.com/u/2914398?s=200&v=4]<https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>
OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
apache/oozie@5998c18 -
GitHub<https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>
Mirror of Apache Oozie. Contribute to apache/oozie development by creating an
account on GitHub.
github.com
In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with a
"conf" coming from line 169 like following:
https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
Configuration conf =
has.createConfiguration(uri.getAuthority());
But in 4.3.0 at
https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
Configuration conf = has.createJobConf(uri.getAuthority());
I am NOT 100% sure, but the above code indeed returns the FileSystem eventually
complains "WRONG FS" in my case, and the above commit changes the "jobConf"
from the createJobConf to createConfiguration.
So my question here, do you think that it is the above change causing my issue?
If so, I believe there is a reason for the above commit, but do I have a
solution also for my use case?
Thanks
Yong