It is Hadoop-2.4.0 with spark-1.3.0. I found that the problem only happen if there are multi nodes. If the cluster has only one node, it works fine.
For example if the cluster has a spark-master on machine A and a spark-worker on machine B, this problem happen. If both spark-master and spark-worker are on machine A, then no problem. I do not use HDFS. I am just saving the RDD to a window share folder rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.obj<file:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj>”) With T: drive mapped to \\10.196.119.230\myshare<file:///\\10.196.119.230\myshare> Ningjun From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, May 22, 2015 5:02 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: spark on Windows 2008 failed to save RDD to windows shared folder The stack trace is related to hdfs. Can you tell us which hadoop release you are using ? Is this a secure cluster ? Thanks On Fri, May 22, 2015 at 1:55 PM, Wang, Ningjun (LNG-NPV) <ningjun.w...@lexisnexis.com<mailto:ningjun.w...@lexisnexis.com>> wrote: I used spark standalone cluster on Windows 2008. I kept on getting the following error when trying to save an RDD to a windows shared folder rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.obj<file:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj>”) 15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12) java.io.IOException: Mkdirs failed to create file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_000000_12 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1071) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The T: drive is mapped to a windows shared folder, e.g. T: -> \\10.196.119.230\myshare<file:///\\10.196.119.230\myshare> The id running spark does have write permission to this folder. It works most of the time but failed sometime. Can anybody tell me what is the problem here? Please advise. Thanks.