You could try adding the configuration in the spark-defaults.conf file. And once you run the application you can actually check on the driver UI (runs on 4040) Environment tab to see if the configuration is set properly.
Thanks Best Regards On Thu, Jun 4, 2015 at 8:40 PM, Justin Steigel <jsteigs...@gmail.com> wrote: > Hi all, > > I'm running Spark on AWS EMR and I'm having some issues getting the > correct permissions on the output files using > rdd.saveAsTextFile('<file_dir_name>'). In hive, I would add a line in the > beginning of the script with > > set fs.s3.canned.acl=BucketOwnerFullControl > > and that would set the correct grantees for the files. For Spark, I tried > adding the permissions as a --conf option: > > hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \ > /home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master > yarn-cluster \ > --conf "spark.driver.extraJavaOptions > -Dfs.s3.canned.acl=BucketOwnerFullControl" \ > hdfs:///user/hadoop/spark.py > > But the permissions do not get set properly on the output files. What is > the proper way to pass in the 'fs.s3.canned.acl=BucketOwnerFullControl' or > any of the S3 canned permissions to the spark job? > > Thanks in advance >