Restarting an executor during execution causes it to lose AWS credentials (anyone seen this?)

Allen George Sun, 20 Mar 2016 00:17:04 -0700

Hi guys,

I'm having a problem where respawning a failed executor during a job that
reads/writes parquet on S3 causes subsequent tasks to fail because of
missing AWS keys.


Setup:

I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple
standalone cluster:

1 master
2 workers

My application is co-located on the master machine, while the two workers
are on two other machines (one worker per machine). All machines are
running in EC2. I've configured my setup so that my application executes
its task on two executors (one executor per worker).

Application:

My application reads and writes parquet files on S3. I set the AWS keys on
the SparkContext by doing:

val sc = new SparkContext()
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3n.awsAccessKeyId", "SOME_KEY")
hadoopConf.set("fs.s3n.awsSecretAccessKey", "SOME_SECRET")

At this point I'm done, and I go ahead and use "sc".

Issue:

I can read and write parquet files without a problem with this setup. *BUT*
if an executor dies during a job and is respawned by a worker, tasks fail
with the following error:

"Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and
Secret Access Key must be specified as the username or password
(respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or
fs.s3n.awsSecretAccessKey properties (respectively)."

I've tried adding the AWS keys to core-site.xml, placing it in
"/etc/hadoop-conf", and setting HADOOP_CONF_DIR in spark-env.sh on the
master/worker machines, but that doesn't seem to help. I tried setting
AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY in the worker environment, but
that didn't work either. It seems that somehow the AWS keys aren't being
picked by a newly-spawned executor. Has anyone seen this before? Is there a
problem with my configuration that's causing this?

Thanks!
Allen

Terminal Musings: http://www.allengeorge.com/
Raft in Java: https://github.com/allengeorge/libraft/
Twitter: https://twitter.com/allenageorge/

Restarting an executor during execution causes it to lose AWS credentials (anyone seen this?)

Reply via email to