Re: Help accessing protected S3

2015-07-23 Thread Steve Loughran

> On 23 Jul 2015, at 10:47, Greg Anderson  
> wrote:
> 
> So when I go to ~/ephemeral-hdfs/bin/hadoop and check its version, it says 
> Hadoop 2.0.0-cdh4.2.0.  If I run pyspark and use the s3a address, things 
> should work, right?  What am I missing?  And thanks so much for the help so 
> far!

nope, sorry. You'll need a CDH 5.x for s3a, and the more recent one, 5.3? to 
get s3a to work reliably.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Help accessing protected S3

2015-07-23 Thread Greg Anderson
So when I go to ~/ephemeral-hdfs/bin/hadoop and check its version, it says 
Hadoop 2.0.0-cdh4.2.0.  If I run pyspark and use the s3a address, things should 
work, right?  What am I missing?  And thanks so much for the help so far!

From: Steve Loughran [ste...@hortonworks.com]
Sent: Thursday, July 23, 2015 11:37 AM
To: Ewan Leith
Cc: Greg Anderson; user@spark.apache.org
Subject: Re: Help accessing protected S3

> On 23 Jul 2015, at 01:50, Ewan Leith  wrote:
>
> I think the standard S3 driver used in Spark from the Hadoop project (S3n) 
> doesn't support IAM role based authentication.
>
> However, S3a should support it. If you're running Hadoop 2.6 via the 
> spark-ec2 scripts (I'm not sure what it launches with by default) try 
> accessing your bucket via s3a:// URLs instead of s3n://
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.apache.org_hadoop_AmazonS3&d=BQIFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=TXFP_8rvYC4ohugIkhrrFPmUS_nnsbQ8vpthF7R33uOSRRwJese6dYIL9RXf6vRA&m=zrgrOb7igpIaLCAUeqWBw38o-ile1wpQ-Rpkzcn55fw&s=x_xKm5UJLAPhVxZQn5uDamLF44NRGFebYBBNJwZx__A&e=
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D10400&d=BQIFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=TXFP_8rvYC4ohugIkhrrFPmUS_nnsbQ8vpthF7R33uOSRRwJese6dYIL9RXf6vRA&m=zrgrOb7igpIaLCAUeqWBw38o-ile1wpQ-Rpkzcn55fw&s=NnHu6qQmQ-AQmpS-UPTf6IPF31ncTJVSPMqq_xfkDM0&e=
>
> Thanks,
> Ewan
>

s3a should support roles. note that it isn't ready for production use before 
Hadoop 2.7.1, various scaiability and performance problems surfaced after 2.6 
shipped

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Help accessing protected S3

2015-07-23 Thread Steve Loughran

> On 23 Jul 2015, at 01:50, Ewan Leith  wrote:
> 
> I think the standard S3 driver used in Spark from the Hadoop project (S3n) 
> doesn't support IAM role based authentication.
> 
> However, S3a should support it. If you're running Hadoop 2.6 via the 
> spark-ec2 scripts (I'm not sure what it launches with by default) try 
> accessing your bucket via s3a:// URLs instead of s3n://
> 
> http://wiki.apache.org/hadoop/AmazonS3
> 
> https://issues.apache.org/jira/browse/HADOOP-10400
> 
> Thanks,
> Ewan
> 

s3a should support roles. note that it isn't ready for production use before 
Hadoop 2.7.1, various scaiability and performance problems surfaced after 2.6 
shipped

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Help accessing protected S3

2015-07-23 Thread Ewan Leith
I think the standard S3 driver used in Spark from the Hadoop project (S3n) 
doesn't support IAM role based authentication.

However, S3a should support it. If you're running Hadoop 2.6 via the spark-ec2 
scripts (I'm not sure what it launches with by default) try accessing your 
bucket via s3a:// URLs instead of s3n://

http://wiki.apache.org/hadoop/AmazonS3

https://issues.apache.org/jira/browse/HADOOP-10400

Thanks,
Ewan



-Original Message-
From: Greg Anderson [mailto:gregory.ander...@familysearch.org] 
Sent: 22 July 2015 18:00
To: user@spark.apache.org
Subject: Help accessing protected S3

I have a protected s3 bucket that requires a certain IAM role to access.  When 
I start my cluster using the spark-ec2 script, everything works just fine until 
I try to read from that part of s3.  Here is the command I am using:

./spark-ec2 -k KEY -i KEY_FILE.pem --additional-security-group=IAM_ROLE 
--copy-aws-credentials --zone=us-east-1e -t m1.large --worker-instances=3 
--hadoop-major-version=2.7.1 --user-data=test.sh launch my-cluster

I have read through this article: 
http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-td16303.html

The problem seems to be very similar, but I wasn't able to find a solution in 
it for me.  I'm not sure what else to provide here, just let me know what you 
need.  Thanks in advance!
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org