[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677467#comment-15677467 ] Michael Gummelt commented on SPARK-10643: - It's not just HDFS. HTTP urls fail as well: {code} ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local http://mgummelt-mesos.s3.amazonaws.com/spark-examples_2.11-2.0.0.jar Warning: Skip remote jar http://mgummelt-mesos.s3.amazonaws.com/spark-examples_2.11-2.0.0.jar. java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} But the docs say this is supported: "hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected" > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1581#comment-1581 ] Laurent Hoss commented on SPARK-10643: -- +1 would be very useful when using Zeppelin (running in docker) on a *mesos* cluster Unfort. the Zeppelin GUI does not either/yet support adding jars from HDFS, nor does it support some kind of http-upload, instead only from local dirs (not practical when inside docker) or from a (custom) maven-repos (not ideal for quick dev iterations). After I learned that it should work with 'cluster mode' I tried to submit a spark job in cluster mode, within zeppelin but it then failed because it can't find the builtin zeppelin-spark interpreter jar (when driver is ran in the 'cluster). Not sure yet if that's actually an issue (as I'ld assume spark taking care to transfer the provided '--jars' ..) > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280679#comment-15280679 ] Prabhu Kasinathan commented on SPARK-10643: --- +1 to fix this issue. We are running lot of spark jobs on yarn-client mode and we desperately need this bug or feature request to be fixed. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279318#comment-15279318 ] Michael Gummelt commented on SPARK-10643: - +1 to fix this. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176166#comment-15176166 ] Alan Braithwaite commented on SPARK-10643: -- ¯\_(ツ)_/¯ Either way, it doesn't look like it was ever implemented. Call it a bug or new feature, doesn't matter to me. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176150#comment-15176150 ] Enrique commented on SPARK-10643: - [~abraithwaite] you are wrong, this is a bug, because on official docs was references that you can use hdfs on spark-submit con client mode. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159550#comment-15159550 ] Alan Braithwaite commented on SPARK-10643: -- It's not a bug, just a feature request. The code in question skips remote jars when running in local mode, not cluster mode. It would be nice if it downloaded the jar and started the driver as you would expect. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158873#comment-15158873 ] Jim Lohse commented on SPARK-10643: --- I have not taken the time to test this yet, is this report perhaps having a wider application to other deploy modes and job managers? There is this question on SO, http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs?rq=1 but there user was trying to use HDFS app-jar in local mode. If there's a specific bug on that I haven't found it. Thanks. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org