Peter, Also When I submit a job with new http client jar, I get
```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 500, message: Server Error``` On Thu, May 17, 2018 at 12:14 PM purna pradeep <purna2prad...@gmail.com> wrote: > Ok I have tried this > > It appears that s3a support requires httpclient 4.4.x and oozie is bundled > with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops > loading. > > > > On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote: > >> Purna, >> >> Based on >> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 >> you should try to go for s3a. >> You'll have to include the aws-jdk as well if I see it correctly: >> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A >> Also, the property names are slightly different so you'll have to change >> the example I've given. >> >> >> >> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> Peter, >>> >>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck >>> >>> Is this for s3 or s3a ? >>> >>> I’m using s3 but if this is for s3a do you know which jar I need to >>> include I mean Hadoop-aws jar or any other jar if required >>> >>> Hadoop-aws-2.8.3.jar is what I’m using >>> >>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> >>> wrote: >>> >>>> Ok, I've found it: >>>> >>>> If you are using 4.3.0 or newer this is the part which checks for >>>> dependencies: >>>> >>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 >>>> It passes the coordinator action's configuration and even does >>>> impersonation to check for the dependencies: >>>> >>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159 >>>> >>>> Have you tried the following in the coordinator xml: >>>> >>>> <action> >>>> <workflow> >>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path> >>>> <configuration> >>>> <property> >>>> <name>fs.s3.awsAccessKeyId</name> >>>> <value>[YOURKEYID]</value> >>>> </property> >>>> <property> >>>> <name>fs.s3.awsSecretAccessKey</name> >>>> <value>[YOURKEY]</value> >>>> </property> >>>> </configuration> >>>> </workflow> >>>> </action> >>>> >>>> Based on the source this should be able to poll s3 periodically. >>>> >>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep < >>>> purna2prad...@gmail.com> wrote: >>>> >>>>> >>>>> I have tried with coordinator's configuration too but no luck ☹️ >>>>> >>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Great progress there purna! :) >>>>>> >>>>>> Have you tried adding these properites to the coordinator's >>>>>> configuration? we usually use the action config to build up connection to >>>>>> the distributed file system. >>>>>> Although I'm not sure we're using these when polling the dependencies >>>>>> for coordinators, but I'm excited about you trying to make it work! >>>>>> >>>>>> I'll get back with a - hopefully - more helpful answer soon, I have >>>>>> to check the code in more depth first. >>>>>> gp >>>>>> >>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep < >>>>>> purna2prad...@gmail.com> wrote: >>>>>> >>>>>>> Peter, >>>>>>> >>>>>>> I got rid of this error by adding >>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar >>>>>>> >>>>>>> But I’m getting below error now >>>>>>> >>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret >>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and >>>>>>> fs.s3.awsSecretAccessKey properties (respectively) >>>>>>> >>>>>>> I have tried adding AWS access ,secret keys in >>>>>>> >>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep < >>>>>>> purna2prad...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> I have tried this ,just added s3 instead of * >>>>>>>> >>>>>>>> <property> >>>>>>>> >>>>>>>> >>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> >>>>>>>> >>>>>>>> <value>hdfs,hftp,webhdfs,s3</value> >>>>>>>> >>>>>>>> </property> >>>>>>>> >>>>>>>> >>>>>>>> Getting below error >>>>>>>> >>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) >>>>>>>> >>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623 >>>>>>>> >>>>>>>> >>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep < >>>>>>>> purna2prad...@gmail.com> wrote: >>>>>>>> >>>>>>>>> This is what is in the logs >>>>>>>>> >>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - >>>>>>>>> SERVER[localhost] Loaded urihandlers >>>>>>>>> [org.apache.oozie.dependency.FSURIHandler] >>>>>>>>> >>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - >>>>>>>>> SERVER[localhost] Loaded default urihandler >>>>>>>>> org.apache.oozie.dependency.FSURIHandler >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> That's strange, this exception should not happen in that case. >>>>>>>>>> Can you check the server logs for messages like this? >>>>>>>>>> LOG.info("Loaded urihandlers {0}", >>>>>>>>>> Arrays.toString(classes)); >>>>>>>>>> LOG.info("Loaded default urihandler {0}", >>>>>>>>>> defaultHandler.getClass().getName()); >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep < >>>>>>>>>> purna2prad...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> This is what I already have in my oozie-site.xml >>>>>>>>>>> >>>>>>>>>>> <property> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> >>>>>>>>>>> >>>>>>>>>>> <value>*</value> >>>>>>>>>>> >>>>>>>>>>> </property> >>>>>>>>>>> >>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh < >>>>>>>>>>> gezap...@cloudera.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> You'll have to configure >>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems >>>>>>>>>>>> hdfs,hftp,webhdfs Enlist >>>>>>>>>>>> the different filesystems supported for federation. If wildcard >>>>>>>>>>>> "*" is >>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly. >>>>>>>>>>>> >>>>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml >>>>>>>>>>>> >>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep < >>>>>>>>>>>> purna2prad...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> > Peter, >>>>>>>>>>>> > >>>>>>>>>>>> > I have tried to specify dataset with uri starting with s3://, >>>>>>>>>>>> s3a:// and >>>>>>>>>>>> > s3n:// and I am getting exception >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed >>>>>>>>>>>> > >>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: >>>>>>>>>>>> Scheme [s3] not >>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data] >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>>>>> > URIHandlerService.java:185) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>>>>> > URIHandlerService.java:168) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>>>>> > URIHandlerService.java:160) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > >>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs( >>>>>>>>>>>> > CoordCommandUtils.java:465) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > >>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance( >>>>>>>>>>>> > CoordCommandUtils.java:546) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>>>>>> > >>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > >>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>>>>>> > MaterializeTransitionXCommand.java:73) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > >>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>>>>>> > MaterializeTransitionXCommand.java:29) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > >>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run( >>>>>>>>>>>> > CallableQueueService.java:181) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>>>>>>>> > ThreadPoolExecutor.java:1149) >>>>>>>>>>>> > >>>>>>>>>>>> > at >>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>>>>>>>>> > ThreadPoolExecutor.java:624) >>>>>>>>>>>> > >>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748) >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work >>>>>>>>>>>> in Apache >>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so >>>>>>>>>>>> > >>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh < >>>>>>>>>>>> gezap...@cloudera.com> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check >>>>>>>>>>>> out this >>>>>>>>>>>> > > < >>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9- >>>>>>>>>>>> > x/topics/admin_oozie_s3.html >>>>>>>>>>>> > > > >>>>>>>>>>>> > > description on how to make it work in jobs, something >>>>>>>>>>>> similar should work >>>>>>>>>>>> > > on the server side as well >>>>>>>>>>>> > > >>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep < >>>>>>>>>>>> purna2prad...@gmail.com> >>>>>>>>>>>> > > wrote: >>>>>>>>>>>> > > >>>>>>>>>>>> > > > Thanks Andras, >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 >>>>>>>>>>>> as input events >>>>>>>>>>>> > > to >>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark >>>>>>>>>>>> action >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action >>>>>>>>>>>> until a file is >>>>>>>>>>>> > > > arrived on a given AWS s3 location >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros < >>>>>>>>>>>> > andras.pi...@cloudera.com >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > wrote: >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > > Hi, >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or >>>>>>>>>>>> bundle >>>>>>>>>>>> > definitions, >>>>>>>>>>>> > > > as >>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and >>>>>>>>>>>> scalable way. Oozie >>>>>>>>>>>> > > > needs >>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action >>>>>>>>>>>> being no >>>>>>>>>>>> > exception. >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie >>>>>>>>>>>> without those Hadoop >>>>>>>>>>>> > > > > components. How to install Oozie please *find here >>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > Regards, >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > Andras >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep < >>>>>>>>>>>> > > purna2prad...@gmail.com> >>>>>>>>>>>> > > > > wrote: >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > > Hi, >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie >>>>>>>>>>>> without having >>>>>>>>>>>> > > > > Hadoop >>>>>>>>>>>> > > > > > cluster? >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on >>>>>>>>>>>> Kubernetes cluster >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > I’m a beginner in oozie >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > > Thanks >>>>>>>>>>>> > > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > -- >>>>>>>>>>>> > > *Peter Cseh *| Software Engineer >>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com> >>>>>>>>>>>> > > >>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>>>>> > > >>>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>>>>>> [image: >>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >>>>>>>>>>>> [image: >>>>>>>>>>>> > Cloudera >>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>>>>> > > ------------------------------ >>>>>>>>>>>> > > >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Peter Cseh *| Software Engineer >>>>>>>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>>>>>>> >>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>>>>> >>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>>>>>> [image: >>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> >>>>>>>>>>>> [image: Cloudera >>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>>>>> ------------------------------ >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Peter Cseh *| Software Engineer >>>>>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>>>>> >>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>>> >>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>>> ------------------------------ >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Peter Cseh *| Software Engineer >>>>>> cloudera.com <https://www.cloudera.com> >>>>>> >>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>> >>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>> ------------------------------ >>>>>> >>>>>> >>>> >>>> >>>> -- >>>> *Peter Cseh *| Software Engineer >>>> cloudera.com <https://www.cloudera.com> >>>> >>>> [image: Cloudera] <https://www.cloudera.com/> >>>> >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>> ------------------------------ >>>> >>>> >> >> >> -- >> *Peter Cseh *| Software Engineer >> cloudera.com <https://www.cloudera.com> >> >> [image: Cloudera] <https://www.cloudera.com/> >> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >> ------------------------------ >> >>