Hello list, *Scenario : *I am trying to read an Avro file stored in S3 and create a DataFrame out of it using *Spark-Avro* <https://github.com/databricks/spark-avro> library, but unable to do so. This is the code which I am using :
public class S3DataFrame { public static void main(String[] args) { System.out.println("START..."); SparkConf conf = new SparkConf().setAppName("DataFrameDemo").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(conf); Configuration config = sc.hadoopConfiguration(); config.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"); config.set("fs.s3a.access.key","****************"); config.set("fs.s3a.secret.key","*****************"); config.set("fs.s3a.endpoint", "s3-us-west-2.amazonaws.com"); SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.load("s3a://bucket-name/file.avro", "com.databricks.spark.avro"); df.show(); df.printSchema(); df.select("title").show(); System.out.println("DONE"); // df.save("/new/dir/", "com.databricks.spark.avro"); } } *Problem :* *Getting Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;* And this is the complete exception trace : Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 63A603F1DC6FB900), S3 Extended Request ID: vh5XhXSVO5ZvhX8c4I3tOWQD/T+B0ZW/MCYzUnuNnQ0R2JoBmJ0MPmUePRiQnPVASTbkonoFPIg= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1088) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:735) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:461) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:296) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3743) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1005) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:688) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:71) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:248) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623) at com.databricks.spark.avro.AvroRelation.newReader(AvroRelation.scala:105) at com.databricks.spark.avro.AvroRelation.<init>(AvroRelation.scala:60) at com.databricks.spark.avro.DefaultSource.createRelation(DefaultSource.scala:41) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:673) at org.myorg.dataframe.S3DataFrame.main(S3DataFrame.java:25) Would really appreciate some help. Thank you so much for your precious time. *Software versions used :* spark-1.3.1-bin-hadoop2.4 hadoop-aws-2.6.0.jar MAC OS X 10.10.3 java version "1.6.0_65" [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti>