java.io.FileNotFoundException when using HDFS in cluster mode

Nick Travers Sun, 29 Mar 2015 22:37:23 -0700

Hi List,

I'm following this example  here
<https://github.com/databricks/learning-spark/tree/master/mini-complete-example>
  
with the following:


$SPARK_HOME/bin/spark-submit \
  --deploy-mode cluster \
  --master spark://host.domain.ex:7077 \
  --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
 
hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
\
  hdfs://host.domain.ex/user/nickt/linkage
hdfs://host.domain.ex/user/nickt/wordcounts

The jar is submitted fine and I can see it appear on the driver node (i.e.
connecting to and reading from HDFS ok):

-rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
learning-spark-mini-example_2.10-0.0.1.jar
-rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
-rw-r--r-- 1 nickt nickt    0 Mar 29 22:05 stdout

But it's failing due to a java.io.FileNotFoundException saying my input file
is missing:

Caused by: java.io.FileNotFoundException: Added file
file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
does not exist.

I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
the file on each of the hosts.

Has anyone come up against this before when reading from HDFS? No doubt I'm
doing something wrong.

Full trace below:

Launch Command: "/usr/java/java8/bin/java" "-cp"
":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
"-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
"-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
"-Dspark.akka.askTimeout=10"
"-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
"-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
"org.apache.spark.deploy.worker.DriverWrapper"
"akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker"
"/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
"com.oreilly.learningsparkexamples.mini.scala.WordCount"
"hdfs://host.domain.ex/user/nickt/linkage"
"hdfs://host.domain.ex/user/nickt/wordcounts"
========================================

log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nickt); users
with modify permissions: Set(nickt)
15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
44201.
15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nickt); users
with modify permissions: Set(nickt)
15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
port 33382.
15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
/tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
MB
15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/29 22:05:05 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:42484
15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
server' on port 42484.
15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/29 22:05:06 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/03/29 22:05:06 INFO SparkUI: Started SparkUI at
http://host5.domain.ex:4040
15/03/29 22:05:06 ERROR SparkContext: Jar not found at
target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar
15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master
akka.tcp://sparkmas...@host.domain.ex:7077/user/Master...
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark
cluster with app ID app-20150329220506-0027
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765
(host3.domain.ex:33765) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464
(host6.domain.ex:35464) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914
(host2.domain.ex:40914) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927
(host4.domain.ex:35927) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546
(host1.domain.ex:60546) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485
(host.domain.ex:59485) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830
(host5.domain.ex:40830) with 63 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/2 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/0 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/1 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/4 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/3 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/5 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/0 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/1 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/2 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/6 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/3 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/4 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/5 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/6 is now RUNNING
15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447
15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager
15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager
host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>,
host5.domain.ex, 39447)
15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59)
    at
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.io.FileNotFoundException: Added file
file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065)
    at
com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21)
    at
com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala)
    ... 6 more



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

java.io.FileNotFoundException when using HDFS in cluster mode

Reply via email to