Re: Cluster mode deployment from jar in S3
the fact you are using s3:// URLs means that you are using EMR and it's S3 binding lib. Which means you are probably going to have to talk to the AWS team there. Though I'm surprised to see a jets3t stack trace there, as the AWS s3: client uses the amazon SDKs. S3n and s3a don't currently support IAM Auth, which is what's generating the warning. The code in question is actually hadoop-aws.JAR, not the spark team's direct code, and is fixed in Hadoop 2.8 ( see: HADOOP-12723<https://issues.apache.org/jira/browse/HADOOP-12723>) On 4 Jul 2016, at 11:30, Ashic Mahtab mailto:as...@live.com>> wrote: Hi Lohith, Thanks for the response. The S3 bucket does have access restrictions, but the instances in which the Spark master and workers run have an IAM role policy that allows them access to it. As such, we don't really configure the cli with credentials...the IAM roles take care of that. Is there a way to make Spark work the same way? Or should I get temporary credentials somehow (like http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html ), and use them to somehow submit the job? I guess I'll have to set it via environment variables; I can't put it in application code, as the issue is in downloading the jar from S3. -Ashic. From: lohith.sam...@mphasis.com<mailto:lohith.sam...@mphasis.com> To: as...@live.com<mailto:as...@live.com>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: Cluster mode deployment from jar in S3 Date: Mon, 4 Jul 2016 09:50:50 + Hi, The aws CLI already has your access key aid and secret access key when you initially configured it. Is your s3 bucket without any access restrictions? Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga From: Ashic Mahtab [mailto:as...@live.com] Sent: Monday, July 04, 2016 15.06 To: Apache Spark Subject: RE: Cluster mode deployment from jar in S3 Sorry to do this...but... *bump* From: as...@live.com<mailto:as...@live.com> To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Cluster mode deployment from jar in S3 Date: Fri, 1 Jul 2016 17:45:12 +0100 Hello, I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get: 16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks, Ashic. Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or e
RE: Cluster mode deployment from jar in S3
I've found a workaround. I set up an http server serving the jar, and pointed to the http url in spark submit. Which brings me to ask would it be a good option to allow spark-submit to upload a local jar to the master, which the master can then serve via an http interface? The master already runs a web UI, so I imagine we could allow it to receive jars, and serve them as well. Perhaps an additional flag could be used to signify that the local jar should be uploaded in this manner? I'd be happy to take a stab at it...but thoughts? -Ashic. From: as...@live.com To: lohith.sam...@mphasis.com; user@spark.apache.org Subject: RE: Cluster mode deployment from jar in S3 Date: Mon, 4 Jul 2016 11:30:31 +0100 Hi Lohith,Thanks for the response. The S3 bucket does have access restrictions, but the instances in which the Spark master and workers run have an IAM role policy that allows them access to it. As such, we don't really configure the cli with credentials...the IAM roles take care of that. Is there a way to make Spark work the same way? Or should I get temporary credentials somehow (like http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html ), and use them to somehow submit the job? I guess I'll have to set it via environment variables; I can't put it in application code, as the issue is in downloading the jar from S3. -Ashic. From: lohith.sam...@mphasis.com To: as...@live.com; user@spark.apache.org Subject: RE: Cluster mode deployment from jar in S3 Date: Mon, 4 Jul 2016 09:50:50 + Hi, The aws CLI already has your access key aid and secret access key when you initially configured it. Is your s3 bucket without any access restrictions? Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga From: Ashic Mahtab [mailto:as...@live.com] Sent: Monday, July 04, 2016 15.06 To: Apache Spark Subject: RE: Cluster mode deployment from jar in S3 Sorry to do this...but... *bump* From: as...@live.com To: user@spark.apache.org Subject: Cluster mode deployment from jar in S3 Date: Fri, 1 Jul 2016 17:45:12 +0100 Hello, I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get: 16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks, Ashic. Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recip
RE: Cluster mode deployment from jar in S3
Hi Lohith,Thanks for the response. The S3 bucket does have access restrictions, but the instances in which the Spark master and workers run have an IAM role policy that allows them access to it. As such, we don't really configure the cli with credentials...the IAM roles take care of that. Is there a way to make Spark work the same way? Or should I get temporary credentials somehow (like http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html ), and use them to somehow submit the job? I guess I'll have to set it via environment variables; I can't put it in application code, as the issue is in downloading the jar from S3. -Ashic. From: lohith.sam...@mphasis.com To: as...@live.com; user@spark.apache.org Subject: RE: Cluster mode deployment from jar in S3 Date: Mon, 4 Jul 2016 09:50:50 + Hi, The aws CLI already has your access key aid and secret access key when you initially configured it. Is your s3 bucket without any access restrictions? Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga From: Ashic Mahtab [mailto:as...@live.com] Sent: Monday, July 04, 2016 15.06 To: Apache Spark Subject: RE: Cluster mode deployment from jar in S3 Sorry to do this...but... *bump* From: as...@live.com To: user@spark.apache.org Subject: Cluster mode deployment from jar in S3 Date: Fri, 1 Jul 2016 17:45:12 +0100 Hello, I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get: 16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks, Ashic. Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailmas...@mphasis.com and delete this mail from your records.
RE: Cluster mode deployment from jar in S3
Hi, The aws CLI already has your access key aid and secret access key when you initially configured it. Is your s3 bucket without any access restrictions? Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga From: Ashic Mahtab [mailto:as...@live.com] Sent: Monday, July 04, 2016 15.06 To: Apache Spark Subject: RE: Cluster mode deployment from jar in S3 Sorry to do this...but... *bump* From: as...@live.com<mailto:as...@live.com> To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Cluster mode deployment from jar in S3 Date: Fri, 1 Jul 2016 17:45:12 +0100 Hello, I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get: 16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks, Ashic. Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailmas...@mphasis.com and delete this mail from your records.
RE: Cluster mode deployment from jar in S3
Sorry to do this...but... *bump* From: as...@live.com To: user@spark.apache.org Subject: Cluster mode deployment from jar in S3 Date: Fri, 1 Jul 2016 17:45:12 +0100 Hello,I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get: 16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks,Ashic.
Cluster mode deployment from jar in S3
Hello,I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get: 16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks,Ashic.