Hard to say with #1 without knowing your application’s characteristics; for #2, we use conductor <https://github.com/BD2KGenomics/conductor> with IAM roles, .boto/.aws/credentials files.
Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 > On Mar 15, 2016, at 11:45 AM, Andy Davidson <a...@santacruzintegration.com> > wrote: > > We use the spark-ec2 script to create AWS clusters as needed (we do not use > AWS EMR) > will we get better performance if we copy data to HDFS before we run instead > of reading directly from S3? > 2. What is a good way to move results from HDFS to S3? > > > It seems like there are many ways to bulk copy to s3. Many of them require we > explicitly use the AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@ > <mailto:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt>. This seems like a > bad idea? > > What would you recommend? > > Thanks > > Andy > >