Re: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-30 Thread nsalian
Try running it like this:

sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi
--deploy-mode cluster --master yarn
hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10


Caveats:
1) Make sure the permissions of /user/nick is 775 or 777.
2) No need for hostname, try hdfs://path-to-jar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287p22303.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-30 Thread java8964
I think the jar file has to be local. In HDFS is not supported yet in Spark.
See this answer:
http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs

> Date: Sun, 29 Mar 2015 22:34:46 -0700
> From: n.e.trav...@gmail.com
> To: user@spark.apache.org
> Subject: java.io.FileNotFoundException when using HDFS in cluster mode
> 
> Hi List,
> 
> I'm following this example  here
> 
>   
> with the following:
> 
> $SPARK_HOME/bin/spark-submit \
>   --deploy-mode cluster \
>   --master spark://host.domain.ex:7077 \
>   --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
>  
> hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
> \
>   hdfs://host.domain.ex/user/nickt/linkage
> hdfs://host.domain.ex/user/nickt/wordcounts
> 
> The jar is submitted fine and I can see it appear on the driver node (i.e.
> connecting to and reading from HDFS ok):
> 
> -rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
> learning-spark-mini-example_2.10-0.0.1.jar
> -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
> -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout
> 
> But it's failing due to a java.io.FileNotFoundException saying my input file
> is missing:
> 
> Caused by: java.io.FileNotFoundException: Added file
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
> 
> I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
> workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
> the file on each of the hosts.
> 
> Has anyone come up against this before when reading from HDFS? No doubt I'm
> doing something wrong.
> 
> Full trace below:
> 
> Launch Command: "/usr/java/java8/bin/java" "-cp"
> ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
> "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
> "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "-Dspark.akka.askTimeout=10"
> "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
> "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
> "org.apache.spark.deploy.worker.DriverWrapper"
> "akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker"
> "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
> "com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "hdfs://host.domain.ex/user/nickt/linkage"
> "hdfs://host.domain.ex/user/nickt/wordcounts"
> 
> 
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
> 44201.
> 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
> port 33382.
> 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
> 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
> 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
> /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
> 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
> akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
> MB
> 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
> 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
> 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/

Re: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-29 Thread Akhil Das
What happens when you do:

sc.textFile("hdfs://path/to/the_file.txt")

Thanks
Best Regards

On Mon, Mar 30, 2015 at 11:04 AM, Nick Travers 
wrote:

> Hi List,
>
> I'm following this example  here
> <
> https://github.com/databricks/learning-spark/tree/master/mini-complete-example
> >
> with the following:
>
> $SPARK_HOME/bin/spark-submit \
>   --deploy-mode cluster \
>   --master spark://host.domain.ex:7077 \
>   --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
>
> hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
> \
>   hdfs://host.domain.ex/user/nickt/linkage
> hdfs://host.domain.ex/user/nickt/wordcounts
>
> The jar is submitted fine and I can see it appear on the driver node (i.e.
> connecting to and reading from HDFS ok):
>
> -rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
> learning-spark-mini-example_2.10-0.0.1.jar
> -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
> -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout
>
> But it's failing due to a java.io.FileNotFoundException saying my input
> file
> is missing:
>
> Caused by: java.io.FileNotFoundException: Added file
>
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
>
> I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
> workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
> the file on each of the hosts.
>
> Has anyone come up against this before when reading from HDFS? No doubt I'm
> doing something wrong.
>
> Full trace below:
>
> Launch Command: "/usr/java/java8/bin/java" "-cp"
>
> ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
> "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
> "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "-Dspark.akka.askTimeout=10"
>
> "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
> "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
> "org.apache.spark.deploy.worker.DriverWrapper"
> "akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker"
>
> "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
> "com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "hdfs://host.domain.ex/user/nickt/linkage"
> "hdfs://host.domain.ex/user/nickt/wordcounts"
> 
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
> 44201.
> 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
> port 33382.
> 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
> 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
> 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
>
> /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
> 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
> akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
> MB
> 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
>
> /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
> 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
> 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:05 INFO AbstractConnector: Started
> SocketConnector@0.0.0.0:42484
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
> server' on port 42484.
> 15/03/29 22:05:05 INFO SparkEnv: Registeri