Re: Hadoop AWS module (Spark) is inventing a secret-ket each time

2017-03-08 Thread Jonhy Stack
I have sent the message there as well, I thought I would send it here as
well because im actually setting up the hadoopConf

On Wed, Mar 8, 2017 at 6:49 PM, Ravi Prakash  wrote:

> Sorry to hear about your travails.
>
> I think you might be better off asking the spark community:
> http://spark.apache.org/community.html
>
> On Wed, Mar 8, 2017 at 3:22 AM, Jonhy Stack  wrote:
>
>> Hi,
>>
>> I'm trying to read a s3 bucket from Spark and up until today Spark always
>> complain that the request return 403
>>
>> hadoopConf = spark_context._jsc.hadoopConfiguration()
>> hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
>> hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
>> hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AF
>> ileSystem")
>> logs = spark_context.textFile("s3a://mybucket/logs/*)
>>
>> Spark was saying  Invalid Access key [ACCESSKEY]
>>
>> However with the same ACCESSKEY and SECRETKEY this was working with
>> aws-cli
>>
>> aws s3 ls mybucket/logs/
>>
>> and in python boto3 this was working
>>
>> resource = boto3.resource("s3", region_name="us-east-1")
>> resource.Object("mybucket", "logs/text.py") \
>> .put(Body=open("text.py", "rb"),ContentType="text/x-py")
>>
>> so my credentials ARE invalid and the problem is definitely something
>> with Spark..
>>
>> Today I decided to turn on the "DEBUG" log for the entire spark and to my
>> suprise... Spark is NOT using the [SECRETKEY] I have provided but
>> instead... add a random one???
>>
>> 17/03/08 10:40:04 DEBUG request: Sending Request: HEAD
>> https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS
>> ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4
>> Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,
>> Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type:
>> application/x-www-form-urlencoded; charset=utf-8, )
>>
>> This is why it still return 403! Spark is not using the key I provide
>> with fs.s3a.secret.key but instead invent a random one EACH time (everytime
>> I submit the job the random secret key is different)
>>
>> For the record I'm running this locally on my machine (OSX) with this
>> command
>>
>> spark-submit --packages com.amazonaws:aws-java-sdk-pom
>> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
>>
>> Could some one enlighten me on this?
>>
>
>


Re: Hadoop AWS module (Spark) is inventing a secret-ket each time

2017-03-08 Thread Ravi Prakash
Sorry to hear about your travails.

I think you might be better off asking the spark community:
http://spark.apache.org/community.html

On Wed, Mar 8, 2017 at 3:22 AM, Jonhy Stack  wrote:

> Hi,
>
> I'm trying to read a s3 bucket from Spark and up until today Spark always
> complain that the request return 403
>
> hadoopConf = spark_context._jsc.hadoopConfiguration()
> hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
> hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
> hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AF
> ileSystem")
> logs = spark_context.textFile("s3a://mybucket/logs/*)
>
> Spark was saying  Invalid Access key [ACCESSKEY]
>
> However with the same ACCESSKEY and SECRETKEY this was working with aws-cli
>
> aws s3 ls mybucket/logs/
>
> and in python boto3 this was working
>
> resource = boto3.resource("s3", region_name="us-east-1")
> resource.Object("mybucket", "logs/text.py") \
> .put(Body=open("text.py", "rb"),ContentType="text/x-py")
>
> so my credentials ARE invalid and the problem is definitely something with
> Spark..
>
> Today I decided to turn on the "DEBUG" log for the entire spark and to my
> suprise... Spark is NOT using the [SECRETKEY] I have provided but
> instead... add a random one???
>
> 17/03/08 10:40:04 DEBUG request: Sending Request: HEAD
> https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS
> ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4
> Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,
> Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type:
> application/x-www-form-urlencoded; charset=utf-8, )
>
> This is why it still return 403! Spark is not using the key I provide with
> fs.s3a.secret.key but instead invent a random one EACH time (everytime I
> submit the job the random secret key is different)
>
> For the record I'm running this locally on my machine (OSX) with this
> command
>
> spark-submit --packages com.amazonaws:aws-java-sdk-pom
> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
>
> Could some one enlighten me on this?
>


Hadoop AWS module (Spark) is inventing a secret-ket each time

2017-03-08 Thread Jonhy Stack
Hi,

I'm trying to read a s3 bucket from Spark and up until today Spark always
complain that the request return 403

hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark was saying  Invalid Access key [ACCESSKEY]

However with the same ACCESSKEY and SECRETKEY this was working with aws-cli

aws s3 ls mybucket/logs/

and in python boto3 this was working

resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
.put(Body=open("text.py", "rb"),ContentType="text/x-py")

so my credentials ARE invalid and the problem is definitely something with
Spark..

Today I decided to turn on the "DEBUG" log for the entire spark and to my
suprise... Spark is NOT using the [SECRETKEY] I have provided but
instead... add a random one???

17/03/08 10:40:04 DEBUG request: Sending Request: HEAD
https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS
ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4
Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,
Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type:
application/x-www-form-urlencoded;
charset=utf-8, )

This is why it still return 403! Spark is not using the key I provide with
fs.s3a.secret.key but instead invent a random one EACH time (everytime I
submit the job the random secret key is different)

For the record I'm running this locally on my machine (OSX) with this
command

spark-submit --packages com.amazonaws:aws-java-sdk-
pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

Could some one enlighten me on this?