Hello,I've got a Spark stand-alone cluster using EC2 instances. I can submit
jobs using "--deploy-mode client", however using "--deploy-mode cluster" is
proving to be a challenge. I've tries this:
spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster
s3://bucket/dir/foo.jar
When I do this, I get:
16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was:
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
must be specified as the username or password (respectively) of a s3 URL, or by
setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties
(respectively).java.lang.IllegalArgumentException: AWS Access Key ID and Secret
Access Key must be specified as the username or password (respectively) of a s3
URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey
properties (respectively). at
org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
Now I'm not using any S3 or hadoop stuff within my code (it's just an
sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the
jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role
the machine's are in allow them to copy the jar. In other words, this works:
aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar
I'm using Spark 1.6.2, and can't really think of what I can do so that I can
submit the jar from s3 using cluster deploy mode. I've also tried simply
downloading the jar onto a node, and spark-submitting that... that works in
client mode, but I get a not found error when using cluster mode.
Any help will be appreciated.
Thanks,Ashic.