Purna, Based on https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 you should try to go for s3a. You'll have to include the aws-jdk as well if I see it correctly: https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A Also, the property names are slightly different so you'll have to change the example I've given.
On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com> wrote: > Peter, > > I’m using latest oozie 5.0.0 and I have tried below changes but no luck > > Is this for s3 or s3a ? > > I’m using s3 but if this is for s3a do you know which jar I need to > include I mean Hadoop-aws jar or any other jar if required > > Hadoop-aws-2.8.3.jar is what I’m using > > On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote: > >> Ok, I've found it: >> >> If you are using 4.3.0 or newer this is the part which checks for >> dependencies: >> https://github.com/apache/oozie/blob/master/core/src/ >> main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 >> It passes the coordinator action's configuration and even does >> impersonation to check for the dependencies: >> https://github.com/apache/oozie/blob/master/core/src/ >> main/java/org/apache/oozie/coord/input/logic/ >> CoordInputLogicEvaluatorPhaseOne.java#L159 >> >> Have you tried the following in the coordinator xml: >> >> <action> >> <workflow> >> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path> >> <configuration> >> <property> >> <name>fs.s3.awsAccessKeyId</name> >> <value>[YOURKEYID]</value> >> </property> >> <property> >> <name>fs.s3.awsSecretAccessKey</name> >> <value>[YOURKEY]</value> >> </property> >> </configuration> >> </workflow> >> </action> >> >> Based on the source this should be able to poll s3 periodically. >> >> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> >>> I have tried with coordinator's configuration too but no luck ☹️ >>> >>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> >>> wrote: >>> >>>> Great progress there purna! :) >>>> >>>> Have you tried adding these properites to the coordinator's >>>> configuration? we usually use the action config to build up connection to >>>> the distributed file system. >>>> Although I'm not sure we're using these when polling the dependencies >>>> for coordinators, but I'm excited about you trying to make it work! >>>> >>>> I'll get back with a - hopefully - more helpful answer soon, I have to >>>> check the code in more depth first. >>>> gp >>>> >>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com >>>> > wrote: >>>> >>>>> Peter, >>>>> >>>>> I got rid of this error by adding >>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar >>>>> >>>>> But I’m getting below error now >>>>> >>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret >>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and >>>>> fs.s3.awsSecretAccessKey properties (respectively) >>>>> >>>>> I have tried adding AWS access ,secret keys in >>>>> >>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> I have tried this ,just added s3 instead of * >>>>>> >>>>>> <property> >>>>>> >>>>>> <name>oozie.service.HadoopAccessorService. >>>>>> supported.filesystems</name> >>>>>> >>>>>> <value>hdfs,hftp,webhdfs,s3</value> >>>>>> >>>>>> </property> >>>>>> >>>>>> >>>>>> Getting below error >>>>>> >>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>>>>> >>>>>> at org.apache.hadoop.conf.Configuration.getClass( >>>>>> Configuration.java:2369) >>>>>> >>>>>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass( >>>>>> FileSystem.java:2793) >>>>>> >>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem( >>>>>> FileSystem.java:2810) >>>>>> >>>>>> at org.apache.hadoop.fs.FileSystem.access$200( >>>>>> FileSystem.java:100) >>>>>> >>>>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal( >>>>>> FileSystem.java:2849) >>>>>> >>>>>> at org.apache.hadoop.fs.FileSystem$Cache.get( >>>>>> FileSystem.java:2831) >>>>>> >>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) >>>>>> >>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run( >>>>>> HadoopAccessorService.java:625) >>>>>> >>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run( >>>>>> HadoopAccessorService.java:623 >>>>>> >>>>>> >>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep < >>>>>> purna2prad...@gmail.com> wrote: >>>>>> >>>>>>> This is what is in the logs >>>>>>> >>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - >>>>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency. >>>>>>> FSURIHandler] >>>>>>> >>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - >>>>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency. >>>>>>> FSURIHandler >>>>>>> >>>>>>> >>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> That's strange, this exception should not happen in that case. >>>>>>>> Can you check the server logs for messages like this? >>>>>>>> LOG.info("Loaded urihandlers {0}", >>>>>>>> Arrays.toString(classes)); >>>>>>>> LOG.info("Loaded default urihandler {0}", >>>>>>>> defaultHandler.getClass().getName()); >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep < >>>>>>>> purna2prad...@gmail.com> wrote: >>>>>>>> >>>>>>>>> This is what I already have in my oozie-site.xml >>>>>>>>> >>>>>>>>> <property> >>>>>>>>> >>>>>>>>> <name>oozie.service.HadoopAccessorService. >>>>>>>>> supported.filesystems</name> >>>>>>>>> >>>>>>>>> <value>*</value> >>>>>>>>> >>>>>>>>> </property> >>>>>>>>> >>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> You'll have to configure >>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems >>>>>>>>>> hdfs,hftp,webhdfs Enlist >>>>>>>>>> the different filesystems supported for federation. If wildcard >>>>>>>>>> "*" is >>>>>>>>>> specified, then ALL file schemes will be allowed.properly. >>>>>>>>>> >>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml >>>>>>>>>> >>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep < >>>>>>>>>> purna2prad...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> > Peter, >>>>>>>>>> > >>>>>>>>>> > I have tried to specify dataset with uri starting with s3://, >>>>>>>>>> s3a:// and >>>>>>>>>> > s3n:// and I am getting exception >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>>>>>>>>> > [s3://mybucket/input.data] Making the job failed >>>>>>>>>> > >>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme >>>>>>>>>> [s3] not >>>>>>>>>> > supported in uri [s3:// mybucket /input.data] >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>>> > URIHandlerService.java:185) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>>> > URIHandlerService.java:168) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>>> > URIHandlerService.java:160) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>>> createEarlyURIs( >>>>>>>>>> > CoordCommandUtils.java:465) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>>> materializeOneInstance( >>>>>>>>>> > CoordCommandUtils.java:546) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom >>>>>>>>>> mand.java:492) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>>>> > MaterializeTransitionXCommand.java:73) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>>>> > MaterializeTransitionXCommand.java:29) >>>>>>>>>> > >>>>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java: >>>>>>>>>> 290) >>>>>>>>>> > >>>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > org.apache.oozie.service.CallableQueueService$ >>>>>>>>>> CallableWrapper.run( >>>>>>>>>> > CallableQueueService.java:181) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>>>>>> > ThreadPoolExecutor.java:1149) >>>>>>>>>> > >>>>>>>>>> > at >>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>>>>>>> > ThreadPoolExecutor.java:624) >>>>>>>>>> > >>>>>>>>>> > at java.lang.Thread.run(Thread.java:748) >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > Is S3 support specific to CDH distribution or should it work in >>>>>>>>>> Apache >>>>>>>>>> > Oozie as well? I’m not using CDH yet so >>>>>>>>>> > >>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh < >>>>>>>>>> gezap...@cloudera.com> wrote: >>>>>>>>>> > >>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out >>>>>>>>>> this >>>>>>>>>> > > < >>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9- >>>>>>>>>> > x/topics/admin_oozie_s3.html >>>>>>>>>> > > > >>>>>>>>>> > > description on how to make it work in jobs, something similar >>>>>>>>>> should work >>>>>>>>>> > > on the server side as well >>>>>>>>>> > > >>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep < >>>>>>>>>> purna2prad...@gmail.com> >>>>>>>>>> > > wrote: >>>>>>>>>> > > >>>>>>>>>> > > > Thanks Andras, >>>>>>>>>> > > > >>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as >>>>>>>>>> input events >>>>>>>>>> > > to >>>>>>>>>> > > > poll for a dependency file before kicking off a spark action >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > For example: I don’t want to kick off a spark action until >>>>>>>>>> a file is >>>>>>>>>> > > > arrived on a given AWS s3 location >>>>>>>>>> > > > >>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros < >>>>>>>>>> > andras.pi...@cloudera.com >>>>>>>>>> > > > >>>>>>>>>> > > > wrote: >>>>>>>>>> > > > >>>>>>>>>> > > > > Hi, >>>>>>>>>> > > > > >>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle >>>>>>>>>> > definitions, >>>>>>>>>> > > > as >>>>>>>>>> > > > > well as sharelib files in a safe, distributed and >>>>>>>>>> scalable way. Oozie >>>>>>>>>> > > > needs >>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being >>>>>>>>>> no >>>>>>>>>> > exception. >>>>>>>>>> > > > > >>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without >>>>>>>>>> those Hadoop >>>>>>>>>> > > > > components. How to install Oozie please *find here >>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. >>>>>>>>>> > > > > >>>>>>>>>> > > > > Regards, >>>>>>>>>> > > > > >>>>>>>>>> > > > > Andras >>>>>>>>>> > > > > >>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep < >>>>>>>>>> > > purna2prad...@gmail.com> >>>>>>>>>> > > > > wrote: >>>>>>>>>> > > > > >>>>>>>>>> > > > > > Hi, >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie >>>>>>>>>> without having >>>>>>>>>> > > > > Hadoop >>>>>>>>>> > > > > > cluster? >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on >>>>>>>>>> Kubernetes cluster >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > I’m a beginner in oozie >>>>>>>>>> > > > > > >>>>>>>>>> > > > > > Thanks >>>>>>>>>> > > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > -- >>>>>>>>>> > > *Peter Cseh *| Software Engineer >>>>>>>>>> > > cloudera.com <https://www.cloudera.com> >>>>>>>>>> > > >>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>>> > > >>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>>>> [image: >>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >>>>>>>>>> [image: >>>>>>>>>> > Cloudera >>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>>> > > ------------------------------ >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Peter Cseh *| Software Engineer >>>>>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>>>>> >>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>>> >>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>>>> [image: >>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> >>>>>>>>>> [image: Cloudera >>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>>> ------------------------------ >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Peter Cseh *| Software Engineer >>>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>>> >>>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>>> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>> ------------------------------ >>>>>>>> >>>>>>>> >>>> >>>> >>>> -- >>>> *Peter Cseh *| Software Engineer >>>> cloudera.com <https://www.cloudera.com> >>>> >>>> [image: Cloudera] <https://www.cloudera.com/> >>>> >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>> ------------------------------ >>>> >>>> >> >> >> -- >> *Peter Cseh *| Software Engineer >> cloudera.com <https://www.cloudera.com> >> >> [image: Cloudera] <https://www.cloudera.com/> >> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >> ------------------------------ >> >> -- *Peter Cseh *| Software Engineer cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------