Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit

Steve Loughran Thu, 01 Sep 2016 01:59:31 -0700

On 1 Sep 2016, at 03:45, Divya Gehlot 
<divya.htco...@gmail.com<mailto:divya.htco...@gmail.com>> wrote:


Hi,
I am using Spark 1.6.1 in EMR machine
I am trying to read s3 buckets in my Spark job .
When I read it through Spark shell I am able to read it ,but when I try to 
package the job and and run it as spark submit I am getting below error

16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for [TERM, 
HUP, INT]

16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: 
appattempt_1468570153734_2851_000001
Exception in thread "main" java.util.ServiceConfigurationError: 
org.apache.hadoop.fs.FileSystem: Provider 
org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)

I have already included

 "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15",

in my build.sbt


Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath 
spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on your 
CP. You can't just drop in a new JAR as it is incompatible at the API level ( 
https://issues.apache.org/jira/browse/HADOOP-12269 )


    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <version>1.7.4</version>
      <scope>compile</scope>
    </dependency>


and jackson artifacts databind and annotations in sync with the rest of your app


    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
    </dependency>



I tried the provinding the access key also in my job still the same error 
persists.

when I googled it I if you have IAM role created there is no need to provide 
access key .



You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair amount 
of reworking of how S3A does authentication.

Note that if you launch spark jobs with the AWS environment variables set, 
these will be automatically picked up and used to set the relevant properties 
in the configuration.

Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit

Reply via email to