Hi List, I'm following this example here <https://github.com/databricks/learning-spark/tree/master/mini-complete-example> with the following:
$SPARK_HOME/bin/spark-submit \ --deploy-mode cluster \ --master spark://host.domain.ex:7077 \ --class com.oreilly.learningsparkexamples.mini.scala.WordCount \ hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar \ hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts The jar is submitted fine and I can see it appear on the driver node (i.e. connecting to and reading from HDFS ok): -rw-r--r-- 1 nickt nickt 15K Mar 29 22:05 learning-spark-mini-example_2.10-0.0.1.jar -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr -rw-r--r-- 1 nickt nickt 0 Mar 29 22:05 stdout But it's failing due to a java.io.FileNotFoundException saying my input file is missing: Caused by: java.io.FileNotFoundException: Added file file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage does not exist. I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to the file on each of the hosts. Has anyone come up against this before when reading from HDFS? No doubt I'm doing something wrong. Full trace below: Launch Command: "/usr/java/java8/bin/java" "-cp" ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar" "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false" "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount" "-Dspark.akka.askTimeout=10" "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar" "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker" "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar" "com.oreilly.learningsparkexamples.mini.scala.WordCount" "hdfs://host.domain.ex/user/nickt/linkage" "hdfs://host.domain.ex/user/nickt/wordcounts" ======================================== log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port 44201. 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on port 33382. 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1 MB 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:05 INFO AbstractConnector: Started SocketConnector@0.0.0.0:42484 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file server' on port 42484. 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:06 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/29 22:05:06 INFO SparkUI: Started SparkUI at http://host5.domain.ex:4040 15/03/29 22:05:06 ERROR SparkContext: Jar not found at target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar 15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkmas...@host.domain.ex:7077/user/Master... 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150329220506-0027 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765 (host3.domain.ex:33765) with 64 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464 (host6.domain.ex:35464) with 64 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914 (host2.domain.ex:40914) with 64 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927 (host4.domain.ex:35927) with 64 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546 (host1.domain.ex:60546) with 64 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485 (host.domain.ex:59485) with 64 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830 (host5.domain.ex:40830) with 63 cores 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores, 512.0 MB RAM 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/2 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/0 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/1 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/4 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/3 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/5 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/0 is now RUNNING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/1 is now RUNNING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/2 is now RUNNING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/6 is now LOADING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/3 is now RUNNING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/4 is now RUNNING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/5 is now RUNNING 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: app-20150329220506-0027/6 is now RUNNING 15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447 15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager 15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>, host5.domain.ex, 39447) 15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.io.FileNotFoundException: Added file file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage does not exist. at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065) at com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21) at com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala) ... 6 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org