Cluster mode deployment from jar in S3

Ashic Mahtab Fri, 01 Jul 2016 09:46:03 -0700

Hello,I've got a Spark stand-alone cluster using EC2 instances. I can submit 
jobs using "--deploy-mode client", however using "--deploy-mode cluster" is 
proving to be a challenge. I've tries this:
spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster 
s3://bucket/dir/foo.jar
When I do this, I get:


16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: 
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key 
must be specified as the username or password (respectively) of a s3 URL, or by 
setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties 
(respectively).java.lang.IllegalArgumentException: AWS Access Key ID and Secret 
Access Key must be specified as the username or password (respectively) of a s3 
URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey 
properties (respectively).        at 
org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)        
at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)

Now I'm not using any S3 or hadoop stuff within my code (it's just an 
sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the 
jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role 
the machine's are in allow them to copy the jar. In other words, this works:
aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar
I'm using Spark 1.6.2, and can't really think of what I can do so that I can 
submit the jar from s3 using cluster deploy mode. I've also tried simply 
downloading the jar onto a node, and spark-submitting that... that works in 
client mode, but I get a not found error when using cluster mode.
Any help will be appreciated.
Thanks,Ashic.

Cluster mode deployment from jar in S3

Reply via email to