Re: newbie HDFS S3 best practices

Frank Austin Nothaft Tue, 15 Mar 2016 12:05:31 -0700

Hard to say with #1 without knowing your application’s characteristics; for #2, 
we use conductor <https://github.com/BD2KGenomics/conductor> with IAM roles, 
.boto/.aws/credentials files.


Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On Mar 15, 2016, at 11:45 AM, Andy Davidson <a...@santacruzintegration.com> 
> wrote:
> 
> We use the spark-ec2 script to create AWS clusters as needed (we do not use 
> AWS EMR)
> will we get better performance if we copy data to HDFS before we run instead 
> of reading directly from S3?
>  2. What is a good way to move results from HDFS to S3?
> 
> 
> It seems like there are many ways to bulk copy to s3. Many of them require we 
> explicitly use the AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@ 
> <mailto:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt>. This seems like a 
> bad idea? 
> 
> What would you recommend?
> 
> Thanks
> 
> Andy
> 
>

Re: newbie HDFS S3 best practices

Reply via email to