Re: oozie 5.0.0 on AWS EMR

2019-03-25 Thread Peter Cseh
Hi Yong,
The usage of local filesystems are strictly prohibited in Oozie 5.0.
I'd guess you have a hdfs://seomnode as fs.defaultFS and you're providing
the S3 credentials for the job only.
I'll try to carve out some time to reproduce and fix this, but I can't
promise you anything soon due to other priorities.
Once we have the reproduction steps, we should file a Jira for this.

gp

On Mon, Mar 25, 2019 at 8:34 PM  wrote:

> Hi Yong
>
> Have you also tried s3a in place of s3?
>
>
> -
> Suresh.
>
>
> > On Mar 25, 2019, at 2:03 PM, Peter Cseh 
> wrote:
> >
> > Hey Yong,
> >
> > Thanks for reporting this issue!
> > If I see correctly, your Oozie is set up to talk to a HDFS instance and
> to
> > S3 as well. This is not a scenario I'm too familiar with.
> > Could you give us some easy-to-follow steps to reproduce this?
> > Thanks
> > gp
> >
> >> On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang 
> wrote:
> >>
> >> Hi, oozier:
> >>
> >> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> >> 4.3.
> >>
> >> We found out one nice feature was broken for us on Oozie 5.0.0,
> >> unfortunately.
> >>
> >> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> >> release repository, and in the oozie application properties file, we
> just
> >> use as following:
> >>
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
> >>
> >> And oozie 4.3 runtime will load all the application code from the S3,
> and
> >> still use the oozie sharelib from the HDFS for us, and whole application
> >> workflow works perfectly.
> >>
> >> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as
> our
> >> application repository anymore. The same application will WORK fine if
> the
> >> application is stored in HDFS. But if stored in S3, we got the following
> >> error message:
> >>
> >> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> >> create lib paths list for application
> >> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
> >>at org.apache.oozie.command.wf
> >> .SubmitXCommand.execute(SubmitXCommand.java:168)
> >>... 36 more
> >> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
> >>at
> >> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
> >>at
> >>
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
> >>at
> >>
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
> >>at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
> >>at
> >> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
> >>... 37 more
> >>
> >> It looks like if we config the APP path as in S3 by
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will
> complain
> >> that it cannot load the sharelib any more from the HDFS URI, even though
> >> the all the share lib are indeed stored in the HFDS correct location as
> >> specified in the error message.
> >>
> >> With this error message, I found out the following commit in the Oozie
> 5.0
> >>
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>
> >> Since the error comes from the FileSystem in
> >> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >,
> >> so I think MAYBE above commit causing it?
> >> [https://avatars3.githubusercontent.com/u/2914398?s=200=4]<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>>
> >>
> >> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> >> apache/oozie@5998c18 - GitHub<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>>
> >> Mirror of Apache Oozie. Contribute to apache/oozie development by
> creating
> >> an account on GitHub.
> >> github.com
> >>
> >>
> >> In 5.0.0, on line 202, it is using the "fs" which comes from line 177
> with
> >> a "conf" coming from line 169 like following:
> >>
> 

Re: oozie 5.0.0 on AWS EMR

2019-03-25 Thread Peter Cseh
Hey Yong,

Thanks for reporting this issue!
If I see correctly, your Oozie is set up to talk to a HDFS instance and to
S3 as well. This is not a scenario I'm too familiar with.
Could you give us some easy-to-follow steps to reproduce this?
Thanks
gp

On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang  wrote:

> Hi, oozier:
>
> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> 4.3.
>
> We found out one nice feature was broken for us on Oozie 5.0.0,
> unfortunately.
>
> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> release repository, and in the oozie application properties file, we just
> use as following:
>
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
>
> And oozie 4.3 runtime will load all the application code from the S3, and
> still use the oozie sharelib from the HDFS for us, and whole application
> workflow works perfectly.
>
> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our
> application repository anymore. The same application will WORK fine if the
> application is stored in HDFS. But if stored in S3, we got the following
> error message:
>
> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> create lib paths list for application
> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
> at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
> at org.apache.oozie.command.wf
> .SubmitXCommand.execute(SubmitXCommand.java:168)
> ... 36 more
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
> at
> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
> at
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
> at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
> at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
> at
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
> at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
> ... 37 more
>
> It looks like if we config the APP path as in S3 by
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain
> that it cannot load the sharelib any more from the HDFS URI, even though
> the all the share lib are indeed stored in the HFDS correct location as
> specified in the error message.
>
> With this error message, I found out the following commit in the Oozie 5.0
>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>
> Since the error comes from the FileSystem in
> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>,
> so I think MAYBE above commit causing it?
> [https://avatars3.githubusercontent.com/u/2914398?s=200=4]<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
>
> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> apache/oozie@5998c18 - GitHub<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
> Mirror of Apache Oozie. Contribute to apache/oozie development by creating
> an account on GitHub.
> github.com
>
>
> In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with
> a "conf" coming from line 169 like following:
> https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
>
> URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
> Configuration conf =
> has.createConfiguration(uri.getAuthority());
>
>
> But in 4.3.0 at
> https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
>
>
> URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
> Configuration conf = has.createJobConf(uri.getAuthority());
>
>
> I am NOT 100% sure, but the above code indeed returns the FileSystem
> eventually complains "WRONG FS" in my case, and the above commit changes
> the "jobConf" from the createJobConf to createConfiguration.
>
> So my question here, do you think that it is the above change causing my
> issue? If so, I believe there is a reason for the above commit, 

oozie 5.0.0 on AWS EMR

2019-03-21 Thread Daniel Zhang
Hi, oozier:

Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie 4.3.

We found out one nice feature was broken for us on Oozie 5.0.0, unfortunately.

On Oozie 4.3, we put our oozie applications in one S3 bucket, as our release 
repository, and in the oozie application properties file, we just use as 
following:

appBaseDir=${s3.app.bucket}/oozieJobs/${appName}

And oozie 4.3 runtime will load all the application code from the S3, and still 
use the oozie sharelib from the HDFS for us, and whole application workflow 
works perfectly.

After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our 
application repository anymore. The same application will WORK fine if the 
application is stored in HDFS. But if stored in S3, we got the following error 
message:

Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not create 
lib paths list for application 
[s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS: 
hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected: 
s3://bucket-name
at 
org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
at 
org.apache.oozie.command.wf.SubmitXCommand.execute(SubmitXCommand.java:168)
... 36 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected: 
s3://bucket-name
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
at 
com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
at 
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
at 
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
at 
org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
at 
org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
... 37 more

It looks like if we config the APP path as in S3 by 
appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain that 
it cannot load the sharelib any more from the HDFS URI, even though the all the 
share lib are indeed stored in the HFDS correct location as specified in the 
error message.

With this error message, I found out the following commit in the Oozie 5.0
https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109

Since the error comes from the FileSystem in 
core/src/main/java/org/apache/oozie/service/WorkflowAppService.java,
 so I think MAYBE above commit causing it?
[https://avatars3.githubusercontent.com/u/2914398?s=200=4]

OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… · 
apache/oozie@5998c18 - 
GitHub
Mirror of Apache Oozie. Contribute to apache/oozie development by creating an 
account on GitHub.
github.com


In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with a 
"conf" coming from line 169 like following: 
https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166

URI uri = new URI(jobConf.get(OozieClient.APP_PATH));

Configuration conf = 
has.createConfiguration(uri.getAuthority());


But in 4.3.0 at 
https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167


URI uri = new URI(jobConf.get(OozieClient.APP_PATH));

Configuration conf = has.createJobConf(uri.getAuthority());


I am NOT 100% sure, but the above code indeed returns the FileSystem eventually 
complains "WRONG FS" in my case, and the above commit changes the "jobConf" 
from the createJobConf to createConfiguration.

So my question here, do you think that it is the above change causing my issue? 
If so, I believe there is a reason for the above commit, but do I have a 
solution also for my use case?

Thanks

Yong