Re: Forbidded : Error Code: 403
Tried almost all the options, but it did not work. So, I ended up creating a new IAM user and the keys of this user are working fine. I am not getting Forbidden(403) exception now, but my program seems to be running infinitely. It's not throwing any exception, but continues to run continuously with following trace : . . . . 15/05/18 17:35:44 INFO HttpServer: Starting HTTP Server 15/05/18 17:35:44 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/18 17:35:44 INFO AbstractConnector: Started SocketConnector@0.0.0.0:60316 15/05/18 17:35:44 INFO Utils: Successfully started service 'HTTP file server' on port 60316. 15/05/18 17:35:44 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/18 17:35:44 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/18 17:35:44 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/18 17:35:44 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/18 17:35:44 INFO SparkUI: Started SparkUI at http://172.28.210.74:4040 15/05/18 17:35:44 INFO Executor: Starting executor ID driver on host localhost 15/05/18 17:35:44 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@172.28.210.74:60315/user/HeartbeatReceiver 15/05/18 17:35:44 INFO NettyBlockTransferService: Server created on 60317 15/05/18 17:35:44 INFO BlockManagerMaster: Trying to register BlockManager 15/05/18 17:35:44 INFO BlockManagerMasterActor: Registering block manager localhost:60317 with 66.9 MB RAM, BlockManagerId(driver, localhost, 60317) 15/05/18 17:35:44 INFO BlockManagerMaster: Registered BlockManager 15/05/18 17:35:45 WARN AmazonHttpClient: Detected a possible problem with the current JVM version (1.6.0_65). If you experience XML parsing problems using the SDK, try upgrading to a more recent JVM update. 15/05/18 17:35:47 INFO S3AFileSystem: Getting path status for s3a://bucket-name/avro_data/episodes.avro (avro_data/episodes.avro) 15/05/18 17:35:47 INFO S3AFileSystem: Getting path status for s3a://bucket-name/avro_data/episodes.avro (avro_data/episodes.avro) 15/05/18 17:35:47 INFO S3AFileSystem: Getting path status for s3a://bucket-name/avro_data/episodes.avro (avro_data/episodes.avro) 15/05/18 17:35:48 INFO S3AFileSystem: Opening 's3a://bucket-name/avro_data/episodes.avro' for reading 15/05/18 17:35:48 INFO S3AFileSystem: Getting path status for s3a://bucket-name/avro_data/episodes.avro (avro_data/episodes.avro) 15/05/18 17:35:48 INFO S3AFileSystem: Actually opening file avro_data/episodes.avro at pos 0 15/05/18 17:35:48 INFO S3AFileSystem: Reopening avro_data/episodes.avro to seek to new offset -4 15/05/18 17:35:48 INFO S3AFileSystem: Actually opening file avro_data/episodes.avro at pos 0 15/05/18 17:35:50 INFO MemoryStore: ensureFreeSpace(230868) called with curMem=0, maxMem=70177259 15/05/18 17:35:50 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 225.5 KB, free 66.7 MB) 15/05/18 17:35:50 INFO MemoryStore: ensureFreeSpace(31491) called with curMem=230868, maxMem=70177259 15/05/18 17:35:50 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 30.8 KB, free 66.7 MB) 15/05/18 17:35:50 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60317 (size: 30.8 KB, free: 66.9 MB) 15/05/18 17:35:50 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/05/18 17:35:50 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:82 15/05/18 17:35:50 INFO S3AFileSystem: Getting path status for s3a://bucket-name/avro_data/episodes.avro (avro_data/episodes.avro) 15/05/18 17:35:50 INFO FileInputFormat: Total input paths to process : 1 15/05/18 17:35:50 INFO SparkContext: Starting job: runJob at SparkPlan.scala:122 15/05/18 17:35:50 INFO DAGScheduler: Got job 0 (runJob at SparkPlan.scala:122) with 1 output partitions (allowLocal=false) 15/05/18 17:35:50 INFO DAGScheduler: Final stage: Stage 0(runJob at SparkPlan.scala:122) 15/05/18 17:35:50 INFO DAGScheduler: Parents of final stage: List() 15/05/18 17:35:50 INFO DAGScheduler: Missing parents: List() 15/05/18 17:35:50 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at map at SparkPlan.scala:97), which has no missing parents 15/05/18 17:35:50 INFO MemoryStore: ensureFreeSpace(3448) called with curMem=262359, maxMem=70177259 15/05/18 17:35:50 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.4 KB, free 66.7 MB) 15/05/18 17:35:50 INFO MemoryStore: ensureFreeSpace(2386) called with curMem=265807, maxMem=70177259 15/05/18 17:35:50 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 66.7 MB) 15/05/18 17:35:50 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:60317 (size: 2.3 KB, free: 66.9 MB) 15/05/18 17:35:50 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/05/18 17:35:50 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839 15/05/18 17:35:50 INFO DAGScheduler: Submitting 1 missing tasks
Re: Forbidded : Error Code: 403
I think you can try this way also: DataFrame df = sqlContext.load(s3n://ACCESS-KEY:SECRET-KEY@bucket-name/file.avro, com.databricks.spark.avro); Thanks Best Regards On Sat, May 16, 2015 at 2:02 AM, Mohammad Tariq donta...@gmail.com wrote: Thanks for the suggestion Steve. I'll try that out. Read the long story last night while struggling with this :). I made sure that I don't have any '/' in my key. On Saturday, May 16, 2015, Steve Loughran ste...@hortonworks.com wrote: On 15 May 2015, at 21:20, Mohammad Tariq donta...@gmail.com wrote: Thank you Ayan and Ted for the prompt response. It isn't working with s3n either. And I am able to download the file. In fact I am able to read the same file using s3 API without any issue. sounds like an S3n config problem. Check your configurations - you can test locally via the hdfs dfs command without even starting spark Oh, and if there is a / in your secret key, you're going to to need to generate new one. Long story -- [image: http://] Tariq, Mohammad about.me/mti [image: http://] http://about.me/mti
Forbidded : Error Code: 403
Hello list, *Scenario : *I am trying to read an Avro file stored in S3 and create a DataFrame out of it using *Spark-Avro* https://github.com/databricks/spark-avro library, but unable to do so. This is the code which I am using : public class S3DataFrame { public static void main(String[] args) { System.out.println(START...); SparkConf conf = new SparkConf().setAppName(DataFrameDemo).setMaster(local); JavaSparkContext sc = new JavaSparkContext(conf); Configuration config = sc.hadoopConfiguration(); config.set(fs.s3a.impl, org.apache.hadoop.fs.s3a.S3AFileSystem); config.set(fs.s3a.access.key,); config.set(fs.s3a.secret.key,*); config.set(fs.s3a.endpoint, s3-us-west-2.amazonaws.com); SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.load(s3a://bucket-name/file.avro, com.databricks.spark.avro); df.show(); df.printSchema(); df.select(title).show(); System.out.println(DONE); // df.save(/new/dir/, com.databricks.spark.avro); } } *Problem :* *Getting Exception in thread main com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;* And this is the complete exception trace : Exception in thread main com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 63A603F1DC6FB900), S3 Extended Request ID: vh5XhXSVO5ZvhX8c4I3tOWQD/T+B0ZW/MCYzUnuNnQ0R2JoBmJ0MPmUePRiQnPVASTbkonoFPIg= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1088) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:735) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:461) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:296) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3743) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1005) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:688) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:71) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:248) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623) at com.databricks.spark.avro.AvroRelation.newReader(AvroRelation.scala:105) at com.databricks.spark.avro.AvroRelation.init(AvroRelation.scala:60) at com.databricks.spark.avro.DefaultSource.createRelation(DefaultSource.scala:41) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:673) at org.myorg.dataframe.S3DataFrame.main(S3DataFrame.java:25) Would really appreciate some help. Thank you so much for your precious time. *Software versions used :* spark-1.3.1-bin-hadoop2.4 hadoop-aws-2.6.0.jar MAC OS X 10.10.3 java version 1.6.0_65 [image: http://] Tariq, Mohammad about.me/mti [image: http://] http://about.me/mti
Re: Forbidded : Error Code: 403
Have you verified that you can download the file from bucket-name without using Spark ? Seems like permission issue. Cheers On May 15, 2015, at 5:09 AM, Mohammad Tariq donta...@gmail.com wrote: Hello list, Scenario : I am trying to read an Avro file stored in S3 and create a DataFrame out of it using Spark-Avro library, but unable to do so. This is the code which I am using : public class S3DataFrame { public static void main(String[] args) { System.out.println(START...); SparkConf conf = new SparkConf().setAppName(DataFrameDemo).setMaster(local); JavaSparkContext sc = new JavaSparkContext(conf); Configuration config = sc.hadoopConfiguration(); config.set(fs.s3a.impl, org.apache.hadoop.fs.s3a.S3AFileSystem); config.set(fs.s3a.access.key,); config.set(fs.s3a.secret.key,*); config.set(fs.s3a.endpoint, s3-us-west-2.amazonaws.com); SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.load(s3a://bucket-name/file.avro, com.databricks.spark.avro); df.show(); df.printSchema(); df.select(title).show(); System.out.println(DONE); //df.save(/new/dir/, com.databricks.spark.avro); } } Problem : Getting Exception in thread main com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; And this is the complete exception trace : Exception in thread main com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 63A603F1DC6FB900), S3 Extended Request ID: vh5XhXSVO5ZvhX8c4I3tOWQD/T+B0ZW/MCYzUnuNnQ0R2JoBmJ0MPmUePRiQnPVASTbkonoFPIg= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1088) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:735) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:461) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:296) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3743) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1005) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:688) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:71) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:248) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623) at com.databricks.spark.avro.AvroRelation.newReader(AvroRelation.scala:105) at com.databricks.spark.avro.AvroRelation.init(AvroRelation.scala:60) at com.databricks.spark.avro.DefaultSource.createRelation(DefaultSource.scala:41) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:673) at org.myorg.dataframe.S3DataFrame.main(S3DataFrame.java:25) Would really appreciate some help. Thank you so much for your precious time. Software versions used : spark-1.3.1-bin-hadoop2.4 hadoop-aws-2.6.0.jar MAC OS X 10.10.3 java version 1.6.0_65 Tariq, Mohammad about.me/mti
Re: Forbidded : Error Code: 403
On 15 May 2015, at 21:20, Mohammad Tariq donta...@gmail.com wrote: Thank you Ayan and Ted for the prompt response. It isn't working with s3n either. And I am able to download the file. In fact I am able to read the same file using s3 API without any issue. sounds like an S3n config problem. Check your configurations - you can test locally via the hdfs dfs command without even starting spark Oh, and if there is a / in your secret key, you're going to to need to generate new one. Long story - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Forbidded : Error Code: 403
Thanks for the suggestion Steve. I'll try that out. Read the long story last night while struggling with this :). I made sure that I don't have any '/' in my key. On Saturday, May 16, 2015, Steve Loughran ste...@hortonworks.com wrote: On 15 May 2015, at 21:20, Mohammad Tariq donta...@gmail.com javascript:; wrote: Thank you Ayan and Ted for the prompt response. It isn't working with s3n either. And I am able to download the file. In fact I am able to read the same file using s3 API without any issue. sounds like an S3n config problem. Check your configurations - you can test locally via the hdfs dfs command without even starting spark Oh, and if there is a / in your secret key, you're going to to need to generate new one. Long story -- [image: http://] Tariq, Mohammad about.me/mti [image: http://] http://about.me/mti