When we change

  res.saveAsTextFile("file:///c:/somepath")

to

res.saveAsTextFile("somepath")






On Wed, Dec 11, 2013 at 12:55 PM, Philip Ogren <[email protected]>wrote:

>  You might try a more standard windows path.  I typically write to a local
> directory such as "target/spark-output".
>
>
> On 12/11/2013 10:45 AM, Nathan Kronenfeld wrote:
>
>  We are trying to test out running Spark 0.8.0 on a Windows box, and
> while we can get it to run all the examples that don't output results to
> disk, we can't get it to write output..
>
>  Has anyone been able to write out to a local file on a single node
> windows install without using hdfs?
>
> Here is our test code:
>
> object FileWritingTest {
>     def main (args: Array[String]): Unit = {
>       val sc = new SparkContext("local[1]", "File Writing Test", null,
> null, null, null);
>       val res = sc.parallelize(Range(0, 10), 10).flatMap(p =>
> "%d".format(p * 10))    //generate some work to do
>       res.saveAsTextFile("file:///c:/somepath")    //save the results out
> to a file
>     }
> }
>
> This works as expected using a unix based system. However, when trying to
> run on a windows cmd shell I get the following errors:
>
> [WARN] 11 Dec 2013 12:00:33 - org.apache.hadoop.util.NativeCodeLoader -
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Saving as
> hadoop file of type (NullWritable, Text)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Starting
> job: saveAsTextFile at Test.scala:19
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Got job 0
> (saveAsTextFile at Test.scala:19) with 10 output partitions
> (allowLocal=false)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Final
> stage: Stage 0 (saveAsTextFile at Test.scala:19)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Parents of
> final stage: List()
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Missing
> parents: List()
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Submitting
> Stage 0 (MappedRDD[2] at saveAsTextFile at Test.scala:19), which has no
> missing parents
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Submitting
> 10 missing tasks from Stage 0 (MappedRDD[2] at saveAsTextFile at
> Test.scala:19)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Size of
> task 0 is 5966 bytes
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Running 0
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Loss was
> due to org.apache.hadoop.util.Shell$ExitCodeException
> org.apache.hadoop.util.Shell$ExitCodeException: chmod: getting attributes
> of
> `/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000':
> No such file or directory
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:261)
>         at org.apache.hadoop.util.Shell.run(Shell.java:188)
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
>         at
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:427)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:465)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:781)
>         at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:118)
>         at
> org.apache.hadoop.mapred.SparkHadoopWriter.open(SparkHadoopWriter.scala:86)
>         at
> org.apache.spark.rdd.PairRDDFunctions.writeToFile$1(PairRDDFunctions.scala:667)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680)
>         at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
>         at
> org.apache.spark.scheduler.local.LocalScheduler.runTask(LocalScheduler.scala:198)
>         at
> org.apache.spark.scheduler.local.LocalActor$$anonfun$launchTask$1$$anon$1.run(LocalScheduler.scala:68)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Remove
> TaskSet 0.0 from pool
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Failed to
> run saveAsTextFile at Test.scala:19
> Exception in thread "main" org.apache.spark.SparkException: Job failed:
> Task 0.0:0 failed more than 4 times; aborting job
> org.apache.hadoop.util.Shell$ExitCodeException: chmod: getting attributes
> of
> `/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000':
> No such file or directory
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
>         at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>
> The fact that it's using a cygwin path
> (/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000)
> seems suspect since I'm running from a cmd shell. Running from a cygwin
> shell leads to other errors.
>
> Has anyone's been able to get simple file output to run from either a
> cygwin shell or the windows cmd shell?
>
>  Does anyone knwo if it is Spark or Hadoop that is transforming the path?
>
>
>
>
>  --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  [email protected]
>
>
>


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [email protected]

Reply via email to