You might try a more standard windows path. I typically write to a local directory such as "target/spark-output".

On 12/11/2013 10:45 AM, Nathan Kronenfeld wrote:
We are trying to test out running Spark 0.8.0 on a Windows box, and while we can get it to run all the examples that don't output results to disk, we can't get it to write output..

Has anyone been able to write out to a local file on a single node windows install without using hdfs?

Here is our test code:

object FileWritingTest {
    def main (args: Array[String]): Unit = {
val sc = new SparkContext("local[1]", "File Writing Test", null, null, null, null); val res = sc.parallelize(Range(0, 10), 10).flatMap(p => "%d".format(p * 10)) //generate some work to do res.saveAsTextFile("file:///c:/somepath") //save the results out to a file
    }
}

This works as expected using a unix based system. However, when trying to run on a windows cmd shell I get the following errors:

[WARN] 11 Dec 2013 12:00:33 - org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Saving as hadoop file of type (NullWritable, Text) [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Starting job: saveAsTextFile at Test.scala:19 [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Got job 0 (saveAsTextFile at Test.scala:19) with 10 output partitions (allowLocal=false) [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Final stage: Stage 0 (saveAsTextFile at Test.scala:19) [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Parents of final stage: List() [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Missing parents: List() [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Submitting Stage 0 (MappedRDD[2] at saveAsTextFile at Test.scala:19), which has no missing parents [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Submitting 10 missing tasks from Stage 0 (MappedRDD[2] at saveAsTextFile at Test.scala:19) [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Size of task 0 is 5966 bytes
[INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Running 0
[INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Loss was due to org.apache.hadoop.util.Shell$ExitCodeException org.apache.hadoop.util.Shell$ExitCodeException: chmod: getting attributes of `/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000': No such file or directory
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:261)
        at org.apache.hadoop.util.Shell.run(Shell.java:188)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:427) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:465) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:781)
at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:118) at org.apache.hadoop.mapred.SparkHadoopWriter.open(SparkHadoopWriter.scala:86) at org.apache.spark.rdd.PairRDDFunctions.writeToFile$1(PairRDDFunctions.scala:667) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680)
        at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
at org.apache.spark.scheduler.local.LocalScheduler.runTask(LocalScheduler.scala:198) at org.apache.spark.scheduler.local.LocalActor$$anonfun$launchTask$1$$anon$1.run(LocalScheduler.scala:68) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Remove TaskSet 0.0 from pool [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Failed to run saveAsTextFile at Test.scala:19 Exception in thread "main" org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times; aborting job org.apache.hadoop.util.Shell$ExitCodeException: chmod: getting attributes of `/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000': No such file or directory at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) at org.apache.spark.scheduler.DAGScheduler.org <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)

The fact that it's using a cygwin path (/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000) seems suspect since I'm running from a cmd shell. Running from a cygwin shell leads to other errors.

Has anyone's been able to get simple file output to run from either a cygwin shell or the windows cmd shell?

Does anyone knwo if it is Spark or Hadoop that is transforming the path?




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email: [email protected] <mailto:[email protected]>

Reply via email to