Re: How To Set Environment Variables For Spark Action Script From XML Definition

2018-05-16 Thread Peter Cseh
Hi Richard! I'm happy you've found a workaround for your issue. Yes, "SPARK_HOME" is set to the current working directory in SparkActionExecutor Based on the surrounding

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
Ok, I've found it: If you are using 4.3.0 or newer this is the part which checks for dependencies: https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 It passes the coordinator action's configuration and even does impersona

Re: How To Set Environment Variables For Spark Action Script From XML Definition

2018-05-16 Thread Richard Primera
Greetings, Thanks for the suggestion. I tried this and noted two things. The first is that one has to prepend `oozie.launcher` to the parameter in order for it to have an effect over the actual environment of the script. The second, is that when I did this the python script exited claiming it

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
I have tried with coordinator's configuration too but no luck ☹️ On Wed, May 16, 2018 at 3:54 PM Peter Cseh wrote: > Great progress there purna! :) > > Have you tried adding these properites to the coordinator's configuration? > we usually use the action config to build up connection to the dist

Re: Spark 2.3 in oozie

2018-05-16 Thread Peter Cseh
Wow, that's great news! Can I ask you to summarize the steps necessary to make this happen? It would be good to see everything together - also, it would probably help others as well. Thank you for sharing your struggles - and solutions as well! Peter On Wed, May 16, 2018 at 10:49 PM, purna prad

Re: Spark 2.3 in oozie

2018-05-16 Thread purna pradeep
Thanks Peter! I’m able to run spark pi example on Kubernetes cluster from oozie after this change On Wed, May 16, 2018 at 10:27 AM Peter Cseh wrote: > The version of the xml schema has nothing to do with the version of the > component you're using. > > Thanks for verifying that -Dspark.scala.bi

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
Great progress there purna! :) Have you tried adding these properites to the coordinator's configuration? we usually use the action config to build up connection to the distributed file system. Although I'm not sure we're using these when polling the dependencies for coordinators, but I'm excited

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Artem Ervits
Here's some related info https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md On Wed, May 16, 2018, 3:45 PM purna pradeep

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
Peter, I got rid of this error by adding hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar But I’m getting below error now java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respective

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
I have tried this ,just added s3 instead of * oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs,s3 Getting below error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.h

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
This is what is in the logs 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler] 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.FSURIHand

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
That's strange, this exception should not happen in that case. Can you check the server logs for messages like this? LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); LOG.info("Loaded default urihandler {0}", defaultHandler.getClass().getName()); Thanks On Wed, May 16,

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
This is what I already have in my oozie-site.xml oozie.service.HadoopAccessorService.supported.filesystems * On Wed, May 16, 2018 at 11:37 AM Peter Cseh wrote: > You'll have to configure > oozie.service.HadoopAccessorService.supported.filesystems > hdfs,hftp,webhdfs Enlist > the d

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
You'll have to configure oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs Enlist the different filesystems supported for federation. If wildcard "*" is specified, then ALL file schemes will be allowed.properly. For testing purposes it's ok to put * in there in oozie-site

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
+Peter On Wed, May 16, 2018 at 11:29 AM purna pradeep wrote: > Peter, > > I have tried to specify dataset with uri starting with s3://, s3a:// and > s3n:// and I am getting exception > > > > Exception occurred:E0904: Scheme [s3] not supported in uri > [s3://mybucket/input.data] Making the job fa

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
Peter, I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://mybucket/input.data] Making the job failed org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Shikin, Igor
Hi Peter, I am working with Purna. I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://cmsegmentation-qa/oozie-test/input.data] Making the job failed org.apache.oozie.dependen

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
I think it should be possible for Oozie to poll S3. Check out this description on how to make it work in jobs, something similar should work on the server side as well On Tue, May 15, 2018 at 4:43 PM, purna pradee

Re: Spark 2.3 in oozie

2018-05-16 Thread Peter Cseh
The version of the xml schema has nothing to do with the version of the component you're using. Thanks for verifying that -Dspark.scala.binary.verstion=2.11 is required for compilation with Spark 2.3.0 Oozie does not pull in Spark's Kubernetes artifact. To make it part of the Oozie Spark sharelib