When we change
res.saveAsTextFile("file:///c:/somepath")
to
res.saveAsTextFile("somepath")
On Wed, Dec 11, 2013 at 12:55 PM, Philip Ogren <[email protected]>wrote:
> You might try a more standard windows path. I typically write to a local
> directory such as "target/spark-output".
>
>
> On 12/11/2013 10:45 AM, Nathan Kronenfeld wrote:
>
> We are trying to test out running Spark 0.8.0 on a Windows box, and
> while we can get it to run all the examples that don't output results to
> disk, we can't get it to write output..
>
> Has anyone been able to write out to a local file on a single node
> windows install without using hdfs?
>
> Here is our test code:
>
> object FileWritingTest {
> def main (args: Array[String]): Unit = {
> val sc = new SparkContext("local[1]", "File Writing Test", null,
> null, null, null);
> val res = sc.parallelize(Range(0, 10), 10).flatMap(p =>
> "%d".format(p * 10)) //generate some work to do
> res.saveAsTextFile("file:///c:/somepath") //save the results out
> to a file
> }
> }
>
> This works as expected using a unix based system. However, when trying to
> run on a windows cmd shell I get the following errors:
>
> [WARN] 11 Dec 2013 12:00:33 - org.apache.hadoop.util.NativeCodeLoader -
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Saving as
> hadoop file of type (NullWritable, Text)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Starting
> job: saveAsTextFile at Test.scala:19
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Got job 0
> (saveAsTextFile at Test.scala:19) with 10 output partitions
> (allowLocal=false)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Final
> stage: Stage 0 (saveAsTextFile at Test.scala:19)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Parents of
> final stage: List()
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Missing
> parents: List()
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Submitting
> Stage 0 (MappedRDD[2] at saveAsTextFile at Test.scala:19), which has no
> missing parents
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Submitting
> 10 missing tasks from Stage 0 (MappedRDD[2] at saveAsTextFile at
> Test.scala:19)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Size of
> task 0 is 5966 bytes
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Running 0
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Loss was
> due to org.apache.hadoop.util.Shell$ExitCodeException
> org.apache.hadoop.util.Shell$ExitCodeException: chmod: getting attributes
> of
> `/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000':
> No such file or directory
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:261)
> at org.apache.hadoop.util.Shell.run(Shell.java:188)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
> at
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:427)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:465)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:781)
> at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:118)
> at
> org.apache.hadoop.mapred.SparkHadoopWriter.open(SparkHadoopWriter.scala:86)
> at
> org.apache.spark.rdd.PairRDDFunctions.writeToFile$1(PairRDDFunctions.scala:667)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680)
> at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
> at
> org.apache.spark.scheduler.local.LocalScheduler.runTask(LocalScheduler.scala:198)
> at
> org.apache.spark.scheduler.local.LocalActor$$anonfun$launchTask$1$$anon$1.run(LocalScheduler.scala:68)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Remove
> TaskSet 0.0 from pool
> [INFO] 11 Dec 2013 12:00:33 - org.apache.spark.Logging$class - Failed to
> run saveAsTextFile at Test.scala:19
> Exception in thread "main" org.apache.spark.SparkException: Job failed:
> Task 0.0:0 failed more than 4 times; aborting job
> org.apache.hadoop.util.Shell$ExitCodeException: chmod: getting attributes
> of
> `/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000':
> No such file or directory
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
> at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>
> The fact that it's using a cygwin path
> (/cygdrive/c/somepath/_temporary/_attempt_201312111200_0000_m_000000_0/part-00000)
> seems suspect since I'm running from a cmd shell. Running from a cygwin
> shell leads to other errors.
>
> Has anyone's been able to get simple file output to run from either a
> cygwin shell or the windows cmd shell?
>
> Does anyone knwo if it is Spark or Hadoop that is transforming the path?
>
>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone: +1-416-203-3003 x 238
> Email: [email protected]
>
>
>
--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone: +1-416-203-3003 x 238
Email: [email protected]