Hi guys, I'm having a problem where respawning a failed executor during a job that reads/writes parquet on S3 causes subsequent tasks to fail because of missing AWS keys.
Setup: I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple standalone cluster: 1 master 2 workers My application is co-located on the master machine, while the two workers are on two other machines (one worker per machine). All machines are running in EC2. I've configured my setup so that my application executes its task on two executors (one executor per worker). Application: My application reads and writes parquet files on S3. I set the AWS keys on the SparkContext by doing: val sc = new SparkContext() val hadoopConf = sc.hadoopConfiguration hadoopConf.set("fs.s3n.awsAccessKeyId", "SOME_KEY") hadoopConf.set("fs.s3n.awsSecretAccessKey", "SOME_SECRET") At this point I'm done, and I go ahead and use "sc". Issue: I can read and write parquet files without a problem with this setup. *BUT* if an executor dies during a job and is respawned by a worker, tasks fail with the following error: "Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively)." I've tried adding the AWS keys to core-site.xml, placing it in "/etc/hadoop-conf", and setting HADOOP_CONF_DIR in spark-env.sh on the master/worker machines, but that doesn't seem to help. I tried setting AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY in the worker environment, but that didn't work either. It seems that somehow the AWS keys aren't being picked by a newly-spawned executor. Has anyone seen this before? Is there a problem with my configuration that's causing this? Thanks! Allen Terminal Musings: http://www.allengeorge.com/ Raft in Java: https://github.com/allengeorge/libraft/ Twitter: https://twitter.com/allenageorge/