Re: oozie 5.0.0 on AWS EMR

2019-03-25 Thread Peter Cseh
Hi Yong,
The usage of local filesystems are strictly prohibited in Oozie 5.0.
I'd guess you have a hdfs://seomnode as fs.defaultFS and you're providing
the S3 credentials for the job only.
I'll try to carve out some time to reproduce and fix this, but I can't
promise you anything soon due to other priorities.
Once we have the reproduction steps, we should file a Jira for this.

gp

On Mon, Mar 25, 2019 at 8:34 PM  wrote:

> Hi Yong
>
> Have you also tried s3a in place of s3?
>
>
> -
> Suresh.
>
>
> > On Mar 25, 2019, at 2:03 PM, Peter Cseh 
> wrote:
> >
> > Hey Yong,
> >
> > Thanks for reporting this issue!
> > If I see correctly, your Oozie is set up to talk to a HDFS instance and
> to
> > S3 as well. This is not a scenario I'm too familiar with.
> > Could you give us some easy-to-follow steps to reproduce this?
> > Thanks
> > gp
> >
> >> On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang 
> wrote:
> >>
> >> Hi, oozier:
> >>
> >> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> >> 4.3.
> >>
> >> We found out one nice feature was broken for us on Oozie 5.0.0,
> >> unfortunately.
> >>
> >> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> >> release repository, and in the oozie application properties file, we
> just
> >> use as following:
> >>
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
> >>
> >> And oozie 4.3 runtime will load all the application code from the S3,
> and
> >> still use the oozie sharelib from the HDFS for us, and whole application
> >> workflow works perfectly.
> >>
> >> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as
> our
> >> application repository anymore. The same application will WORK fine if
> the
> >> application is stored in HDFS. But if stored in S3, we got the following
> >> error message:
> >>
> >> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> >> create lib paths list for application
> >> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
> >>at org.apache.oozie.command.wf
> >> .SubmitXCommand.execute(SubmitXCommand.java:168)
> >>... 36 more
> >> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
> >>at
> >> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
> >>at
> >>
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
> >>at
> >>
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
> >>at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
> >>at
> >> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
> >>... 37 more
> >>
> >> It looks like if we config the APP path as in S3 by
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will
> complain
> >> that it cannot load the sharelib any more from the HDFS URI, even though
> >> the all the share lib are indeed stored in the HFDS correct location as
> >> specified in the error message.
> >>
> >> With this error message, I found out the following commit in the Oozie
> 5.0
> >>
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>
> >> Since the error comes from the FileSystem in
> >> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >,
> >> so I think MAYBE above commit causing it?
> >> [https://avatars3.githubusercontent.com/u/2914398?s=200=4]<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>>
> >>
> >> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> >> apache/oozie@5998c18 - GitHub<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>>
> >> Mirror of Apache Oozie. Contribute to apache/oozie development by
> creating
> >> an account on GitHub.
> >> github.com
> >>
> >>
> >> In 5.0.0, on line 202, it is using the "fs" which comes from line 177
> with
> >> a "conf" coming from line 169 like following:
> >>
> 

Re: oozie 5.0.0 on AWS EMR

2019-03-25 Thread Peter Cseh
Hey Yong,

Thanks for reporting this issue!
If I see correctly, your Oozie is set up to talk to a HDFS instance and to
S3 as well. This is not a scenario I'm too familiar with.
Could you give us some easy-to-follow steps to reproduce this?
Thanks
gp

On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang  wrote:

> Hi, oozier:
>
> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> 4.3.
>
> We found out one nice feature was broken for us on Oozie 5.0.0,
> unfortunately.
>
> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> release repository, and in the oozie application properties file, we just
> use as following:
>
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
>
> And oozie 4.3 runtime will load all the application code from the S3, and
> still use the oozie sharelib from the HDFS for us, and whole application
> workflow works perfectly.
>
> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our
> application repository anymore. The same application will WORK fine if the
> application is stored in HDFS. But if stored in S3, we got the following
> error message:
>
> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> create lib paths list for application
> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
> at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
> at org.apache.oozie.command.wf
> .SubmitXCommand.execute(SubmitXCommand.java:168)
> ... 36 more
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
> at
> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
> at
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
> at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
> at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
> at
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
> at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
> ... 37 more
>
> It looks like if we config the APP path as in S3 by
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain
> that it cannot load the sharelib any more from the HDFS URI, even though
> the all the share lib are indeed stored in the HFDS correct location as
> specified in the error message.
>
> With this error message, I found out the following commit in the Oozie 5.0
>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>
> Since the error comes from the FileSystem in
> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>,
> so I think MAYBE above commit causing it?
> [https://avatars3.githubusercontent.com/u/2914398?s=200=4]<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
>
> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> apache/oozie@5998c18 - GitHub<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
> Mirror of Apache Oozie. Contribute to apache/oozie development by creating
> an account on GitHub.
> github.com
>
>
> In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with
> a "conf" coming from line 169 like following:
> https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
>
> URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
> Configuration conf =
> has.createConfiguration(uri.getAuthority());
>
>
> But in 4.3.0 at
> https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
>
>
> URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
> Configuration conf = has.createJobConf(uri.getAuthority());
>
>
> I am NOT 100% sure, but the above code indeed returns the FileSystem
> eventually complains "WRONG FS" in my case, and the above commit changes
> the "jobConf" from the createJobConf to createConfiguration.
>
> So my question here, do you think that it is the above change causing my
> issue? If so, I believe there is a reason for the above commit,