Good catch; the Spark cluster on EC2 is configured to use HDFS as its default filesystem, so it can’t find this file. The quick start was written to run on a single machine with an out-of-the-box install. If you’d like to upload this file to the HDFS cluster on EC2, use the following command:
~/ephemeral-hdfs/bin/hadoop fs -put README.md README.md Matei On Feb 23, 2014, at 6:33 PM, nicholas.chammas <nicholas.cham...@gmail.com> wrote: > I just deployed Spark 0.9.0 to EC2 using the guide here. I then turned to the > Quick Start guide here and walked through it using the Python shell. > > When I do this: > > >>> textFile = sc.textFile("README.md") > >>> textFile.count() > > I get a long error output right after the count() that includes this: > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://ec2-my-node-address.compute-1.amazonaws.com:9000/user/root/README.md > > So I guess Spark assumed that the file was in HDFS. > > To get the file open and count to work, I had to do this: > > >>> textFile = sc.textFile("file:///root/spark/README.md") > >>> textFile.count() > > I get the same results if I use the Scala shell. > > Does the quick start guide need to updated, or did I miss something? > > Nick > > > View this message in context: Spark Quick Start - call to open README.md > needs explicit fs prefix > Sent from the Apache Spark User List mailing list archive at Nabble.com.