Re: Setting S3 output file grantees for spark output files

Akhil Das Fri, 05 Jun 2015 01:04:05 -0700

You could try adding the configuration in the spark-defaults.conf file. And
once you run the application you can actually check on the driver UI (runs
on 4040) Environment tab to see if the configuration is set properly.


Thanks
Best Regards

On Thu, Jun 4, 2015 at 8:40 PM, Justin Steigel <jsteigs...@gmail.com> wrote:

> Hi all,
>
> I'm running Spark on AWS EMR and I'm having some issues getting the
> correct permissions on the output files using
> rdd.saveAsTextFile('<file_dir_name>').  In hive, I would add a line in the
> beginning of the script with
>
> set fs.s3.canned.acl=BucketOwnerFullControl
>
> and that would set the correct grantees for the files. For Spark, I tried
> adding the permissions as a --conf option:
>
> hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \
> /home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master
> yarn-cluster \
> --conf "spark.driver.extraJavaOptions
> -Dfs.s3.canned.acl=BucketOwnerFullControl" \
> hdfs:///user/hadoop/spark.py
>
> But the permissions do not get set properly on the output files. What is
> the proper way to pass in the 'fs.s3.canned.acl=BucketOwnerFullControl' or
> any of the S3 canned permissions to the spark job?
>
> Thanks in advance
>

Re: Setting S3 output file grantees for spark output files

Reply via email to