Re: Oozie for spark jobs without Hadoop
Here you go ! - Add oozie.service.HadoopAccessorService.supported.filesystems as * in oozie-site.xml - include hadoop-aws-2.8.3.jar - Rebuild oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9 - Set jetty_opts with proxy values On Sat, May 19, 2018 at 2:17 AM Peter Cseh <gezap...@cloudera.com> wrote: > Wow, great work! > Can you please summarize the required steps? This would be useful for > others so we probably should add it to our documentation. > Thanks in advance! > Peter > > On Fri, May 18, 2018 at 11:33 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > >> I got this fixed by setting jetty_opts with proxy values. >> >> Thanks Peter!! >> >> On Thu, May 17, 2018 at 4:05 PM purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> Ok I fixed this by adding aws keys in oozie >>> >>> But I’m getting below error >>> >>> I have tried setting proxy in core-site.xml but no luck >>> >>> >>> 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 - >>> SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] >>> JOB[000-180517144113498-oozie-xjt0-C] ACTION[000- >>> 180517144113498-oozie-xjt0-C@2] >>> org.apache.oozie.service.HadoopAccessorException: >>> E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa: >>> com.amazonaws.SdkClientException: Unable to execute HTTP request: >>> Connect to mybucket.s3.amazonaws.com:443 >>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket. >>> s3.amazonaws.com/52.216.165.155 >>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed: >>> connect timed out] >>> >>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception >>> occurred: [doesBucketExist on cmsegmentation-qa: >>> com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect >>> to mybucket.s3.amazonaws.com:443 >>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket >>> .s3.amazonaws.com >>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed: >>> connect timed out] >>> >>> at >>> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:630) >>> >>> at >>> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:594) >>> at org.apache.oozie.dependency. >>> FSURIHandler.getFileSystem(FSURIHandler.java:184)-env.sh >>> >>> But now I’m getting this error >>> >>> >>> >>> On Thu, May 17, 2018 at 2:53 PM purna pradeep <purna2prad...@gmail.com> >>> wrote: >>> >>>> Ok I got passed this error >>>> >>>> By rebuilding oozie with Dhttpclient.version=4.5.5 >>>> -Dhttpcore.version=4.4.9 >>>> >>>> now getting this error >>>> >>>> >>>> >>>> ACTION[000-180517144113498-oozie-xjt0-C@1] >>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception >>>> occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException: >>>> No AWS Credentials provided by BasicAWSCredentialsProvider >>>> EnvironmentVariableCredentialsProvider >>>> SharedInstanceProfileCredentialsProvider : >>>> com.amazonaws.SdkClientException: Unable to load credentials from service >>>> endpoint] >>>> >>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception >>>> occurred: [doesBucketExist on cmsegmentation-qa: >>>> com.amazonaws.AmazonClientException: No AWS Credentials provided by >>>> BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider >>>> SharedInstanceProfileCredentialsProvider : >>>> com.amazonaws.SdkClientException: Unable to load credentials from service >>>> endpoint] >>>> >>>> On Thu, May 17, 2018 at 12:24 PM purna pradeep <purna2prad...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> Peter, >>>>> >>>>> Also When I submit a job with new http client jar, I get >>>>> >>>>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie >>>>> server. No of retries = 1. Exception = Could not authenticate, >>>>> Authentication failed, status: 500, message: Server Error``` >>>>> >>>&g
Event trigger Oozie datasets
Hello , Event trigger Oozie datasets 1) Does oozie supports event trigger? Trigger Workflow based on a file arrival on AWS s3 As per my understanding based on start date mentioned on coordinator it can poll for a file on s3 and once dependency is met it can execute an action/SparkAction but my requirement is trigger workflow based on a file arrival and compare currentdate with starttime(if startime is configured else execute action based on event) and execute action/SparkAction if its time to execute the same. 2)Also i see on datasets we need to specify initial-instance and dataset location is derived from initial-instance value for ex: s3a://app/logs/${YEAR}_${MONTH}_${DAY}_${HOUR} ${coord:latest(0)} Then, the dataset instances for the input events for the coordinator action will be: s3a://app/logs/2009_01_10 But my requirement is im not sure of the dataset generation timestamp and also im not sure of frequency of the dataset generation My requirement is dataset location could be s3a://app/logs/2018_02_10 (ie it may be generated everyday) and when i run my job on 2018/02/11 i should be able to specify to consider either latest or 24hrs or n number of days old (from the day I run workflow ) datset as dependency for the action/SparkAction which im trying to execute. Please suggest !
Re: Oozie for spark jobs without Hadoop
Ok I fixed this by adding aws keys in oozie But I’m getting below error I have tried setting proxy in core-site.xml but no luck 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[000-180517144113498-oozie-xjt0-C] ACTION[000- 180517144113498-oozie-xjt0-C@2] org.apache.oozie.service.HadoopAccessorException: E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to mybucket.s3.amazonaws.com:443 <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket. s3.amazonaws.com/52.216.165.155 <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed: connect timed out] org.apache.oozie.service.HadoopAccessorException: E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to mybucket.s3.amazonaws.com:443 <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.s3.amazonaws.com <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed: connect timed out] at org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:630) at org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:594) at org.apache.oozie.dependency.FSURIHandler.getFileSystem( FSURIHandler.java:184)-env.sh But now I’m getting this error On Thu, May 17, 2018 at 2:53 PM purna pradeep <purna2prad...@gmail.com> wrote: > Ok I got passed this error > > By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9 > > now getting this error > > > > ACTION[000-180517144113498-oozie-xjt0-C@1] > org.apache.oozie.service.HadoopAccessorException: E0902: Exception > occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException: > No AWS Credentials provided by BasicAWSCredentialsProvider > EnvironmentVariableCredentialsProvider > SharedInstanceProfileCredentialsProvider : > com.amazonaws.SdkClientException: Unable to load credentials from service > endpoint] > > org.apache.oozie.service.HadoopAccessorException: E0902: Exception > occurred: [doesBucketExist on cmsegmentation-qa: > com.amazonaws.AmazonClientException: No AWS Credentials provided by > BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider > SharedInstanceProfileCredentialsProvider : > com.amazonaws.SdkClientException: Unable to load credentials from service > endpoint] > > On Thu, May 17, 2018 at 12:24 PM purna pradeep <purna2prad...@gmail.com> > wrote: > >> >> Peter, >> >> Also When I submit a job with new http client jar, I get >> >> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie >> server. No of retries = 1. Exception = Could not authenticate, >> Authentication failed, status: 500, message: Server Error``` >> >> >> On Thu, May 17, 2018 at 12:14 PM purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> Ok I have tried this >>> >>> It appears that s3a support requires httpclient 4.4.x and oozie is >>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI >>> stops loading. >>> >>> >>> >>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> >>> wrote: >>> >>>> Purna, >>>> >>>> Based on >>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 >>>> you should try to go for s3a. >>>> You'll have to include the aws-jdk as well if I see it correctly: >>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A >>>> Also, the property names are slightly different so you'll have to >>>> change the example I've given. >>>> >>>> >>>> >>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com >>>> > wrote: >>>> >>>>> Peter, >>>>> >>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no >>>>> luck >>>>> >>>>> Is this for s3 or s3a ? >>>>> >>>>> I’m using s3 but if this is for s3a do you know which jar I need to >>>>> include I mean Hadoop-aws jar or any other jar if required >>>>> >>>>> Hadoop-aws-2.8.3.jar is what I’m using >>>>> >>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> >>>>>
Re: Oozie for spark jobs without Hadoop
Ok I got passed this error By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9 now getting this error ACTION[000-180517144113498-oozie-xjt0-C@1] org.apache.oozie.service.HadoopAccessorException: E0902: Exception occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint] org.apache.oozie.service.HadoopAccessorException: E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint] On Thu, May 17, 2018 at 12:24 PM purna pradeep <purna2prad...@gmail.com> wrote: > > Peter, > > Also When I submit a job with new http client jar, I get > > ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie > server. No of retries = 1. Exception = Could not authenticate, > Authentication failed, status: 500, message: Server Error``` > > > On Thu, May 17, 2018 at 12:14 PM purna pradeep <purna2prad...@gmail.com> > wrote: > >> Ok I have tried this >> >> It appears that s3a support requires httpclient 4.4.x and oozie is >> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI >> stops loading. >> >> >> >> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> >> wrote: >> >>> Purna, >>> >>> Based on >>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 >>> you should try to go for s3a. >>> You'll have to include the aws-jdk as well if I see it correctly: >>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A >>> Also, the property names are slightly different so you'll have to change >>> the example I've given. >>> >>> >>> >>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com> >>> wrote: >>> >>>> Peter, >>>> >>>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck >>>> >>>> Is this for s3 or s3a ? >>>> >>>> I’m using s3 but if this is for s3a do you know which jar I need to >>>> include I mean Hadoop-aws jar or any other jar if required >>>> >>>> Hadoop-aws-2.8.3.jar is what I’m using >>>> >>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> >>>> wrote: >>>> >>>>> Ok, I've found it: >>>>> >>>>> If you are using 4.3.0 or newer this is the part which checks for >>>>> dependencies: >>>>> >>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 >>>>> It passes the coordinator action's configuration and even does >>>>> impersonation to check for the dependencies: >>>>> >>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159 >>>>> >>>>> Have you tried the following in the coordinator xml: >>>>> >>>>> >>>>> >>>>> hdfs://bar:9000/usr/joe/logsprocessor-wf >>>>> >>>>> >>>>> fs.s3.awsAccessKeyId >>>>> [YOURKEYID] >>>>> >>>>> >>>>> fs.s3.awsSecretAccessKey >>>>> [YOURKEY] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Based on the source this should be able to poll s3 periodically. >>>>> >>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep < >>>>> purna2prad...@gmail.com> wrote: >>>>> >>>>>> >>>>>> I have tried with coordinator's configuration too but no luck ☹️ >>>>>> >>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> >>>>>> wrote: >>>>>> >>>>
Re: Oozie for spark jobs without Hadoop
Peter, Also When I submit a job with new http client jar, I get ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 500, message: Server Error``` On Thu, May 17, 2018 at 12:14 PM purna pradeep <purna2prad...@gmail.com> wrote: > Ok I have tried this > > It appears that s3a support requires httpclient 4.4.x and oozie is bundled > with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops > loading. > > > > On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote: > >> Purna, >> >> Based on >> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 >> you should try to go for s3a. >> You'll have to include the aws-jdk as well if I see it correctly: >> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A >> Also, the property names are slightly different so you'll have to change >> the example I've given. >> >> >> >> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> Peter, >>> >>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck >>> >>> Is this for s3 or s3a ? >>> >>> I’m using s3 but if this is for s3a do you know which jar I need to >>> include I mean Hadoop-aws jar or any other jar if required >>> >>> Hadoop-aws-2.8.3.jar is what I’m using >>> >>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> >>> wrote: >>> >>>> Ok, I've found it: >>>> >>>> If you are using 4.3.0 or newer this is the part which checks for >>>> dependencies: >>>> >>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 >>>> It passes the coordinator action's configuration and even does >>>> impersonation to check for the dependencies: >>>> >>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159 >>>> >>>> Have you tried the following in the coordinator xml: >>>> >>>> >>>> >>>> hdfs://bar:9000/usr/joe/logsprocessor-wf >>>> >>>> >>>> fs.s3.awsAccessKeyId >>>> [YOURKEYID] >>>> >>>> >>>> fs.s3.awsSecretAccessKey >>>> [YOURKEY] >>>> >>>> >>>> >>>> >>>> >>>> Based on the source this should be able to poll s3 periodically. >>>> >>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep < >>>> purna2prad...@gmail.com> wrote: >>>> >>>>> >>>>> I have tried with coordinator's configuration too but no luck ☹️ >>>>> >>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Great progress there purna! :) >>>>>> >>>>>> Have you tried adding these properites to the coordinator's >>>>>> configuration? we usually use the action config to build up connection to >>>>>> the distributed file system. >>>>>> Although I'm not sure we're using these when polling the dependencies >>>>>> for coordinators, but I'm excited about you trying to make it work! >>>>>> >>>>>> I'll get back with a - hopefully - more helpful answer soon, I have >>>>>> to check the code in more depth first. >>>>>> gp >>>>>> >>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep < >>>>>> purna2prad...@gmail.com> wrote: >>>>>> >>>>>>> Peter, >>>>>>> >>>>>>> I got rid of this error by adding >>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar >>>>>>> >>>>>>> But I’m getting below error now >>>>>>> >>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret >>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyI
Re: Oozie for spark jobs without Hadoop
Ok I have tried this It appears that s3a support requires httpclient 4.4.x and oozie is bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops loading. On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote: > Purna, > > Based on > https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 > you should try to go for s3a. > You'll have to include the aws-jdk as well if I see it correctly: > https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A > Also, the property names are slightly different so you'll have to change > the example I've given. > > > > On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > >> Peter, >> >> I’m using latest oozie 5.0.0 and I have tried below changes but no luck >> >> Is this for s3 or s3a ? >> >> I’m using s3 but if this is for s3a do you know which jar I need to >> include I mean Hadoop-aws jar or any other jar if required >> >> Hadoop-aws-2.8.3.jar is what I’m using >> >> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote: >> >>> Ok, I've found it: >>> >>> If you are using 4.3.0 or newer this is the part which checks for >>> dependencies: >>> >>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 >>> It passes the coordinator action's configuration and even does >>> impersonation to check for the dependencies: >>> >>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159 >>> >>> Have you tried the following in the coordinator xml: >>> >>> >>> >>> hdfs://bar:9000/usr/joe/logsprocessor-wf >>> >>> >>> fs.s3.awsAccessKeyId >>> [YOURKEYID] >>> >>> >>> fs.s3.awsSecretAccessKey >>> [YOURKEY] >>> >>> >>> >>> >>> >>> Based on the source this should be able to poll s3 periodically. >>> >>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com >>> > wrote: >>> >>>> >>>> I have tried with coordinator's configuration too but no luck ☹️ >>>> >>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> >>>> wrote: >>>> >>>>> Great progress there purna! :) >>>>> >>>>> Have you tried adding these properites to the coordinator's >>>>> configuration? we usually use the action config to build up connection to >>>>> the distributed file system. >>>>> Although I'm not sure we're using these when polling the dependencies >>>>> for coordinators, but I'm excited about you trying to make it work! >>>>> >>>>> I'll get back with a - hopefully - more helpful answer soon, I have to >>>>> check the code in more depth first. >>>>> gp >>>>> >>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep < >>>>> purna2prad...@gmail.com> wrote: >>>>> >>>>>> Peter, >>>>>> >>>>>> I got rid of this error by adding >>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar >>>>>> >>>>>> But I’m getting below error now >>>>>> >>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret >>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and >>>>>> fs.s3.awsSecretAccessKey properties (respectively) >>>>>> >>>>>> I have tried adding AWS access ,secret keys in >>>>>> >>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep < >>>>>> purna2prad...@gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> I have tried this ,just added s3 instead of * >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>&g
Re: Spark 2.3 in oozie
Thanks Peter! I’m able to run spark pi example on Kubernetes cluster from oozie after this change On Wed, May 16, 2018 at 10:27 AM Peter Cseh <gezap...@cloudera.com> wrote: > The version of the xml schema has nothing to do with the version of the > component you're using. > > Thanks for verifying that -Dspark.scala.binary.verstion=2.11 is required > for compilation with Spark 2.3.0 > > Oozie does not pull in Spark's Kubernetes artifact. > To make it part of the Oozie Spark sharelib you'll have to include the > spark-kubernetes.jar > < > https://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-kubernetes_2.11%7C2.3.0%7Cjar > > > in > the sharelib/spark/pom.xml as a compile-time dependency. > > gp > > On Tue, May 15, 2018 at 9:04 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > > > I’m able to compile successfully with after adding this override option > > > > -Dspark.scala.binary.version=2.11 > > > > Dspark.version = 2.3.0 > > > > But when I’m running a spark action with spark-pi example jar against > > Kubernetes master I’m getting below error in stderr log > > > > > > *Error:Could not load KUBERNETES classes.This copy of spark may not have > > been compiled with Kubernetes support* > > > > Below is my workflow.xml > > > > <*spark xmlns="uri:oozie:spark-action:1.0">* > > > > *${resourceManager}* > > > > *${nameNode}* > > > > *k8s://<***.com>* > > > > *Python-Spark-Pi* > > > > *spark-examples_2.11-2.3.0.jar* > > > > *--class org.apache.spark.examples.SparkPi --conf > > spark.executor.instances=2 --conf spark.kubernetes.namespace=spark --conf > > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > > spark.kubernetes.container.image=artifactory.cloud. > > capitalone.com/kubespark/spark-quantum:v2.3.0 > > <http://artifactory.cloud.capitalone.com/kubespark/spark-quantum:v2.3.0> > > --conf > spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true > > < > http://spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true > > > > > --conf > > spark.kubernetes.driver.label.application=is1-driver --conf > > spark.kubernetes.executor.label.application=is1-exec*utor > > local:///opt/spark/examples/jars/spark-examples_2.11-2.3. > > 0.jar > > > > > > > > > > Is this because of uri:oozie:spark-action:1.0 in spark xml tag? Does it > > needs to be spark-action:2.0 as I’m using spark 2.3? > > > > > > Please suggest! > > > > > > On Tue, May 15, 2018 at 12:43 PM Peter Cseh <gezap...@cloudera.com> > wrote: > > > > > I think the error is related to the Scala version being present in the > > > artifact name. > > > I'll take a look at this tomorrow. > > > Gp > > > > > > On Tue, May 15, 2018, 18:28 Artem Ervits <artemerv...@gmail.com> > wrote: > > > > > > > Did you run > > > > mvn clean install first on the parent directory? > > > > > > > > On Tue, May 15, 2018, 11:35 AM purna pradeep < > purna2prad...@gmail.com> > > > > wrote: > > > > > > > > > Thanks peter, > > > > > > > > > > I have tried changing Dspark.version to 2.3.0 and compiled oozie > I’m > > > > > getting below error from oozie examples > > > > > > > > > > > > > > > *ERROR] Failed to execute goal on project oozie-examples: Could not > > > > resolve > > > > > dependencies for project org.apache.oozie:oozie-examples:jar:5.0.0: > > > Could > > > > > not find artifact org.apache.spark:spark-core_2.10:jar:2.3.0 in > > > > resolution > > > > > * > > > > > > > > > > On Tue, May 15, 2018 at 11:14 AM Peter Cseh <gezap...@cloudera.com > > > > > > wrote: > > > > > > > > > > > Oozie has a spark-2 profile that is currently hard-coded to Spark > > > 2.1: > > > > > > https://github.com/apache/oozie/blob/master/pom.xml#L1983 > > > > > > I'm sure if you overwrite the -Dspark.version and compile Oozie > > that > > > > way > > > > > it > > > > > > will work. > > > > > > gp > > > > > > > > > > > > > > > > > > On Tue, May 15, 2018
Re: Oozie for spark jobs without Hadoop
Peter, I got rid of this error by adding hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar But I’m getting below error now java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively) I have tried adding AWS access ,secret keys in oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com> wrote: > > I have tried this ,just added s3 instead of * > > > > oozie.service.HadoopAccessorService.supported.filesystems > > hdfs,hftp,webhdfs,s3 > > > > > Getting below error > > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369) > > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793) > > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) > > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) > > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) > > at > org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625) > > at > org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623 > > > On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com> > wrote: > >> This is what is in the logs >> >> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost] >> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler] >> >> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost] >> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler >> >> >> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> >> wrote: >> >>> That's strange, this exception should not happen in that case. >>> Can you check the server logs for messages like this? >>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); >>> LOG.info("Loaded default urihandler {0}", >>> defaultHandler.getClass().getName()); >>> Thanks >>> >>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2prad...@gmail.com> >>> wrote: >>> >>>> This is what I already have in my oozie-site.xml >>>> >>>> >>>> >>>> >>>> oozie.service.HadoopAccessorService.supported.filesystems >>>> >>>> * >>>> >>>> >>>> >>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> >>>> wrote: >>>> >>>>> You'll have to configure >>>>> oozie.service.HadoopAccessorService.supported.filesystems >>>>> hdfs,hftp,webhdfs Enlist >>>>> the different filesystems supported for federation. If wildcard "*" is >>>>> specified, then ALL file schemes will be allowed.properly. >>>>> >>>>> For testing purposes it's ok to put * in there in oozie-site.xml >>>>> >>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep < >>>>> purna2prad...@gmail.com> >>>>> wrote: >>>>> >>>>> > Peter, >>>>> > >>>>> > I have tried to specify dataset with uri starting with s3://, s3a:// >>>>> and >>>>> > s3n:// and I am getting exception >>>>> > >>>>> > >>>>> > >>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>>>> > [s3://mybucket/input.data] Making the job failed >>>>> > >>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] >>>>> not >>>>> > supported in uri [s3:// mybucket /input.data] >>>>> > >>>>> > at >>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>> > URIHandlerService.java:185) >>>>> > >>>>> > at >>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>> > URIHandlerService.java:168) >>>>>
Re: Oozie for spark jobs without Hadoop
I have tried this ,just added s3 instead of * oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs,s3 Getting below error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) at org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625) at org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623 On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com> wrote: > This is what is in the logs > > 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost] > Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler] > > 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost] > Loaded default urihandler org.apache.oozie.dependency.FSURIHandler > > > On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> wrote: > >> That's strange, this exception should not happen in that case. >> Can you check the server logs for messages like this? >> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); >> LOG.info("Loaded default urihandler {0}", >> defaultHandler.getClass().getName()); >> Thanks >> >> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> This is what I already have in my oozie-site.xml >>> >>> >>> >>> >>> oozie.service.HadoopAccessorService.supported.filesystems >>> >>> * >>> >>> >>> >>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> >>> wrote: >>> >>>> You'll have to configure >>>> oozie.service.HadoopAccessorService.supported.filesystems >>>> hdfs,hftp,webhdfs Enlist >>>> the different filesystems supported for federation. If wildcard "*" is >>>> specified, then ALL file schemes will be allowed.properly. >>>> >>>> For testing purposes it's ok to put * in there in oozie-site.xml >>>> >>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <purna2prad...@gmail.com >>>> > >>>> wrote: >>>> >>>> > Peter, >>>> > >>>> > I have tried to specify dataset with uri starting with s3://, s3a:// >>>> and >>>> > s3n:// and I am getting exception >>>> > >>>> > >>>> > >>>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>>> > [s3://mybucket/input.data] Making the job failed >>>> > >>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] >>>> not >>>> > supported in uri [s3:// mybucket /input.data] >>>> > >>>> > at >>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>> > URIHandlerService.java:185) >>>> > >>>> > at >>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>> > URIHandlerService.java:168) >>>> > >>>> > at >>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>> > URIHandlerService.java:160) >>>> > >>>> > at >>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs( >>>> > CoordCommandUtils.java:465) >>>> > >>>> > at >>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404) >>>> > >>>> > at >>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>> > materializeInputDataEvents(CoordCommandUtils.java:731) >>>> > >>>> > at >>>> > >>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance( >>>> > CoordCommandUtils.java:546) >>>> > >
Re: Oozie for spark jobs without Hadoop
This is what is in the logs 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler] 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.FSURIHandler On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> wrote: > That's strange, this exception should not happen in that case. > Can you check the server logs for messages like this? > LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); > LOG.info("Loaded default urihandler {0}", > defaultHandler.getClass().getName()); > Thanks > > On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > >> This is what I already have in my oozie-site.xml >> >> >> >> >> oozie.service.HadoopAccessorService.supported.filesystems >> >> * >> >> >> >> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> >> wrote: >> >>> You'll have to configure >>> oozie.service.HadoopAccessorService.supported.filesystems >>> hdfs,hftp,webhdfs Enlist >>> the different filesystems supported for federation. If wildcard "*" is >>> specified, then ALL file schemes will be allowed.properly. >>> >>> For testing purposes it's ok to put * in there in oozie-site.xml >>> >>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <purna2prad...@gmail.com> >>> wrote: >>> >>> > Peter, >>> > >>> > I have tried to specify dataset with uri starting with s3://, s3a:// >>> and >>> > s3n:// and I am getting exception >>> > >>> > >>> > >>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>> > [s3://mybucket/input.data] Making the job failed >>> > >>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not >>> > supported in uri [s3:// mybucket /input.data] >>> > >>> > at >>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>> > URIHandlerService.java:185) >>> > >>> > at >>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>> > URIHandlerService.java:168) >>> > >>> > at >>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>> > URIHandlerService.java:160) >>> > >>> > at >>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs( >>> > CoordCommandUtils.java:465) >>> > >>> > at >>> > org.apache.oozie.command.coord.CoordCommandUtils. >>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404) >>> > >>> > at >>> > org.apache.oozie.command.coord.CoordCommandUtils. >>> > materializeInputDataEvents(CoordCommandUtils.java:731) >>> > >>> > at >>> > >>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance( >>> > CoordCommandUtils.java:546) >>> > >>> > at >>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492) >>> > >>> > at >>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362) >>> > >>> > at >>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>> > MaterializeTransitionXCommand.java:73) >>> > >>> > at >>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>> > MaterializeTransitionXCommand.java:29) >>> > >>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290) >>> > >>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> > >>> > at >>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run( >>> > CallableQueueService.java:181) >>> > >>> > at >>> > java.util.concurrent.ThreadPoolExecutor.runWorker( >>> > ThreadPoolExecutor.java:1149) >>> > >>> > at >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> > ThreadPoolExecutor.java:624) >>> > >>&
Re: Oozie for spark jobs without Hadoop
This is what I already have in my oozie-site.xml oozie.service.HadoopAccessorService.supported.filesystems * On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> wrote: > You'll have to configure > oozie.service.HadoopAccessorService.supported.filesystems > hdfs,hftp,webhdfs Enlist > the different filesystems supported for federation. If wildcard "*" is > specified, then ALL file schemes will be allowed.properly. > > For testing purposes it's ok to put * in there in oozie-site.xml > > On Wed, May 16, 2018 at 5:29 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > > > Peter, > > > > I have tried to specify dataset with uri starting with s3://, s3a:// and > > s3n:// and I am getting exception > > > > > > > > Exception occurred:E0904: Scheme [s3] not supported in uri > > [s3://mybucket/input.data] Making the job failed > > > > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not > > supported in uri [s3:// mybucket /input.data] > > > > at > > org.apache.oozie.service.URIHandlerService.getURIHandler( > > URIHandlerService.java:185) > > > > at > > org.apache.oozie.service.URIHandlerService.getURIHandler( > > URIHandlerService.java:168) > > > > at > > org.apache.oozie.service.URIHandlerService.getURIHandler( > > URIHandlerService.java:160) > > > > at > > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs( > > CoordCommandUtils.java:465) > > > > at > > org.apache.oozie.command.coord.CoordCommandUtils. > > separateResolvedAndUnresolved(CoordCommandUtils.java:404) > > > > at > > org.apache.oozie.command.coord.CoordCommandUtils. > > materializeInputDataEvents(CoordCommandUtils.java:731) > > > > at > > org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance( > > CoordCommandUtils.java:546) > > > > at > > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom > > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492) > > > > at > > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom > > mand.materialize(CoordMaterializeTransitionXCommand.java:362) > > > > at > > org.apache.oozie.command.MaterializeTransitionXCommand.execute( > > MaterializeTransitionXCommand.java:73) > > > > at > > org.apache.oozie.command.MaterializeTransitionXCommand.execute( > > MaterializeTransitionXCommand.java:29) > > > > at org.apache.oozie.command.XCommand.call(XCommand.java:290) > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > at > > org.apache.oozie.service.CallableQueueService$CallableWrapper.run( > > CallableQueueService.java:181) > > > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1149) > > > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:624) > > > > at java.lang.Thread.run(Thread.java:748) > > > > > > > > Is S3 support specific to CDH distribution or should it work in Apache > > Oozie as well? I’m not using CDH yet so > > > > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> > wrote: > > > > > I think it should be possible for Oozie to poll S3. Check out this > > > < > > > https://www.cloudera.com/documentation/enterprise/5-9- > > x/topics/admin_oozie_s3.html > > > > > > > description on how to make it work in jobs, something similar should > work > > > on the server side as well > > > > > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep < > purna2prad...@gmail.com> > > > wrote: > > > > > > > Thanks Andras, > > > > > > > > Also I also would like to know if oozie supports Aws S3 as input > events > > > to > > > > poll for a dependency file before kicking off a spark action > > > > > > > > > > > > For example: I don’t want to kick off a spark action until a file is > > > > arrived on a given AWS s3 location > > > > > > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros < > > andras.pi...@cloudera.com > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Oozie needs HDFS to store workflow, coordinator, or bundle > > definitions, > > &g
Re: Oozie for spark jobs without Hadoop
+Peter On Wed, May 16, 2018 at 11:29 AM purna pradeep <purna2prad...@gmail.com> wrote: > Peter, > > I have tried to specify dataset with uri starting with s3://, s3a:// and > s3n:// and I am getting exception > > > > Exception occurred:E0904: Scheme [s3] not supported in uri > [s3://mybucket/input.data] Making the job failed > > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not > supported in uri [s3:// mybucket /input.data] > > at > org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:185) > > at > org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:168) > > at > org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:160) > > at > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(CoordCommandUtils.java:465) > > at > org.apache.oozie.command.coord.CoordCommandUtils.separateResolvedAndUnresolved(CoordCommandUtils.java:404) > > at > org.apache.oozie.command.coord.CoordCommandUtils.materializeInputDataEvents(CoordCommandUtils.java:731) > > at > org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(CoordCommandUtils.java:546) > > at > org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materializeActions(CoordMaterializeTransitionXCommand.java:492) > > at > org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materialize(CoordMaterializeTransitionXCommand.java:362) > > at > org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:73) > > at > org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:29) > > at org.apache.oozie.command.XCommand.call(XCommand.java:290) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:181) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > Is S3 support specific to CDH distribution or should it work in Apache > Oozie as well? I’m not using CDH yet so > > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote: > >> I think it should be possible for Oozie to poll S3. Check out this >> < >> https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_oozie_s3.html >> > >> description on how to make it work in jobs, something similar should work >> on the server side as well >> >> On Tue, May 15, 2018 at 4:43 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >> > Thanks Andras, >> > >> > Also I also would like to know if oozie supports Aws S3 as input events >> to >> > poll for a dependency file before kicking off a spark action >> > >> > >> > For example: I don’t want to kick off a spark action until a file is >> > arrived on a given AWS s3 location >> > >> > On Tue, May 15, 2018 at 10:17 AM Andras Piros < >> andras.pi...@cloudera.com> >> > wrote: >> > >> > > Hi, >> > > >> > > Oozie needs HDFS to store workflow, coordinator, or bundle >> definitions, >> > as >> > > well as sharelib files in a safe, distributed and scalable way. Oozie >> > needs >> > > YARN to run almost all of its actions, Spark action being no >> exception. >> > > >> > > At the moment it's not feasible to install Oozie without those Hadoop >> > > components. How to install Oozie please *find here >> > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. >> > > >> > > Regards, >> > > >> > > Andras >> > > >> > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep < >> purna2prad...@gmail.com> >> > > wrote: >> > > >> > > > Hi, >> > > > >> > > > Would like to know if I can use sparkaction in oozie without having >> > > Hadoop >> > > > cluster? >> > > > >> > > > I want to use oozie to schedule spark jobs on Kubernetes cluster >> > > > >> > > > I’m a beginner in oozie >> > > > >> > > > Thanks >> > > > >> > > >> > >> >> >> >> -- >> *Peter Cseh *| Software Engineer >> cloudera.com <https://www.cloudera.com> >> >> [image: Cloudera] <https://www.cloudera.com/> >> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera >> on LinkedIn] <https://www.linkedin.com/company/cloudera> >> -- >> >
Re: Oozie for spark jobs without Hadoop
Peter, I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://mybucket/input.data] Making the job failed org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not supported in uri [s3:// mybucket /input.data] at org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:185) at org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:168) at org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:160) at org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(CoordCommandUtils.java:465) at org.apache.oozie.command.coord.CoordCommandUtils.separateResolvedAndUnresolved(CoordCommandUtils.java:404) at org.apache.oozie.command.coord.CoordCommandUtils.materializeInputDataEvents(CoordCommandUtils.java:731) at org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(CoordCommandUtils.java:546) at org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materializeActions(CoordMaterializeTransitionXCommand.java:492) at org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materialize(CoordMaterializeTransitionXCommand.java:362) at org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:73) at org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:29) at org.apache.oozie.command.XCommand.call(XCommand.java:290) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:181) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Is S3 support specific to CDH distribution or should it work in Apache Oozie as well? I’m not using CDH yet so On Wed, May 16, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote: > I think it should be possible for Oozie to poll S3. Check out this > < > https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_oozie_s3.html > > > description on how to make it work in jobs, something similar should work > on the server side as well > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > > > Thanks Andras, > > > > Also I also would like to know if oozie supports Aws S3 as input events > to > > poll for a dependency file before kicking off a spark action > > > > > > For example: I don’t want to kick off a spark action until a file is > > arrived on a given AWS s3 location > > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <andras.pi...@cloudera.com > > > > wrote: > > > > > Hi, > > > > > > Oozie needs HDFS to store workflow, coordinator, or bundle definitions, > > as > > > well as sharelib files in a safe, distributed and scalable way. Oozie > > needs > > > YARN to run almost all of its actions, Spark action being no exception. > > > > > > At the moment it's not feasible to install Oozie without those Hadoop > > > components. How to install Oozie please *find here > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. > > > > > > Regards, > > > > > > Andras > > > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep < > purna2prad...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > > > > > Would like to know if I can use sparkaction in oozie without having > > > Hadoop > > > > cluster? > > > > > > > > I want to use oozie to schedule spark jobs on Kubernetes cluster > > > > > > > > I’m a beginner in oozie > > > > > > > > Thanks > > > > > > > > > > > > > -- > *Peter Cseh *| Software Engineer > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > -- >
Re: Spark 2.3 in oozie
I’m able to compile successfully with after adding this override option -Dspark.scala.binary.version=2.11 Dspark.version = 2.3.0 But when I’m running a spark action with spark-pi example jar against Kubernetes master I’m getting below error in stderr log *Error:Could not load KUBERNETES classes.This copy of spark may not have been compiled with Kubernetes support* Below is my workflow.xml <*spark xmlns="uri:oozie:spark-action:1.0">* *${resourceManager}* *${nameNode}* *k8s://<***.com>* *Python-Spark-Pi* *spark-examples_2.11-2.3.0.jar* *--class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.namespace=spark --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=artifactory.cloud.capitalone.com/kubespark/spark-quantum:v2.3.0 <http://artifactory.cloud.capitalone.com/kubespark/spark-quantum:v2.3.0> --conf spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true <http://spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true> --conf spark.kubernetes.driver.label.application=is1-driver --conf spark.kubernetes.executor.label.application=is1-exec*utor local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar Is this because of uri:oozie:spark-action:1.0 in spark xml tag? Does it needs to be spark-action:2.0 as I’m using spark 2.3? Please suggest! On Tue, May 15, 2018 at 12:43 PM Peter Cseh <gezap...@cloudera.com> wrote: > I think the error is related to the Scala version being present in the > artifact name. > I'll take a look at this tomorrow. > Gp > > On Tue, May 15, 2018, 18:28 Artem Ervits <artemerv...@gmail.com> wrote: > > > Did you run > > mvn clean install first on the parent directory? > > > > On Tue, May 15, 2018, 11:35 AM purna pradeep <purna2prad...@gmail.com> > > wrote: > > > > > Thanks peter, > > > > > > I have tried changing Dspark.version to 2.3.0 and compiled oozie I’m > > > getting below error from oozie examples > > > > > > > > > *ERROR] Failed to execute goal on project oozie-examples: Could not > > resolve > > > dependencies for project org.apache.oozie:oozie-examples:jar:5.0.0: > Could > > > not find artifact org.apache.spark:spark-core_2.10:jar:2.3.0 in > > resolution > > > * > > > > > > On Tue, May 15, 2018 at 11:14 AM Peter Cseh <gezap...@cloudera.com> > > wrote: > > > > > > > Oozie has a spark-2 profile that is currently hard-coded to Spark > 2.1: > > > > https://github.com/apache/oozie/blob/master/pom.xml#L1983 > > > > I'm sure if you overwrite the -Dspark.version and compile Oozie that > > way > > > it > > > > will work. > > > > gp > > > > > > > > > > > > On Tue, May 15, 2018 at 5:07 PM, purna pradeep < > > purna2prad...@gmail.com> > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > Does oozie supports spark 2.3? Or will it even care of the spark > > > version > > > > > > > > > > I want to use spark action > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Purna > > > > > > > > > > > > > > > > > > > > > -- > > > > *Peter Cseh *| Software Engineer > > > > cloudera.com <https://www.cloudera.com> > > > > > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > > Cloudera > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > -- > > > > > > > > > >
Re: Spark 2.3 in oozie
Thanks peter, I have tried changing Dspark.version to 2.3.0 and compiled oozie I’m getting below error from oozie examples *ERROR] Failed to execute goal on project oozie-examples: Could not resolve dependencies for project org.apache.oozie:oozie-examples:jar:5.0.0: Could not find artifact org.apache.spark:spark-core_2.10:jar:2.3.0 in resolution * On Tue, May 15, 2018 at 11:14 AM Peter Cseh <gezap...@cloudera.com> wrote: > Oozie has a spark-2 profile that is currently hard-coded to Spark 2.1: > https://github.com/apache/oozie/blob/master/pom.xml#L1983 > I'm sure if you overwrite the -Dspark.version and compile Oozie that way it > will work. > gp > > > On Tue, May 15, 2018 at 5:07 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > > > Hello, > > > > Does oozie supports spark 2.3? Or will it even care of the spark version > > > > I want to use spark action > > > > > > > > Thanks, > > Purna > > > > > > -- > *Peter Cseh *| Software Engineer > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > -- >
Spark 2.3 in oozie
Hello, Does oozie supports spark 2.3? Or will it even care of the spark version I want to use spark action Thanks, Purna
Re: Oozie for spark jobs without Hadoop
Thanks Andras, Also I also would like to know if oozie supports Aws S3 as input events to poll for a dependency file before kicking off a spark action For example: I don’t want to kick off a spark action until a file is arrived on a given AWS s3 location On Tue, May 15, 2018 at 10:17 AM Andras Piros <andras.pi...@cloudera.com> wrote: > Hi, > > Oozie needs HDFS to store workflow, coordinator, or bundle definitions, as > well as sharelib files in a safe, distributed and scalable way. Oozie needs > YARN to run almost all of its actions, Spark action being no exception. > > At the moment it's not feasible to install Oozie without those Hadoop > components. How to install Oozie please *find here > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. > > Regards, > > Andras > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > > > Hi, > > > > Would like to know if I can use sparkaction in oozie without having > Hadoop > > cluster? > > > > I want to use oozie to schedule spark jobs on Kubernetes cluster > > > > I’m a beginner in oozie > > > > Thanks > > >
Workflow S3 listener
Hi, Hi, I’m very new to oozie ,actually I would like to run Spark 2.3 jobs on oozie based on file arrival on aws s3 which is a dependency for the job I see some examples which uses s3 as input event datasets as below s3n://mybucket/a/b/${YEAR}/${MONTH}/${DAY} So my question is does oozie listenes to file arrival on aws s3 to check for dependency before kicking off spark job ?? Thanks