I want to save to local directory. I have tried the following and get error
r.saveAsTextFile("file:/home/cloudera/tmp/out1") r.saveAsTextFile("file:///home/cloudera/tmp/out1") r.saveAsTextFile("file:////home/cloudera/tmp/out1") They all generate the following error 15/01/12 08:31:10 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5, master01.cloudera): java.io.IOException: Mkdirs failed to create file:/home/cloudera/temp/out1/_temporary/0/_temporary/attempt_201501120831_0001_m_000001_5 (exists=false, cwd=file:/var/run/spark/work/app-20150112080951-0002/0) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The id that run spark and driver program have full permission to the dir /home/cloudera/tmp/. I can cd there and run "mkdir out1" to create the dir without problem. I then remove the dir "out1" and then run r.saveAsTextFile("file:/home/cloudera/tmp/out1") I got the error above but the dir "out1" is created. Look like r.saveAsTextFile(...) try to create sub dirs out1/_temporary/0/_temporary/attempt_201501120831_0001_m_000001_5 which failed. Has anybody successfully run r.saveAsTextFile(...) to save RDD to local file system on Linux? Ningjun -----Original Message----- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, January 12, 2015 11:25 AM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: Failed to save RDD as text file to local file system I think you're confusing HDFS paths and local paths. You are cd'ing to a directory and seem to want to write output there, but your path has no scheme and defaults to being an HDFS path. When you use "file:" you seem to have a permission error (perhaps). On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang <ningjun.w...@lexisnexis.com> wrote: > Prannoy > > > > I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return > without error. But where does it saved to? The folder > “/home/cloudera/tmp/out1” is not cretaed. > > > > I also tried the following > > cd /home/cloudera/tmp/ > > spark-shell > > scala> val r = sc.parallelize(Array("a", "b", "c")) > > scala> r.saveAsTextFile("out1") > > > > It does not return error. But still there is no “out1” folder created > under /home/cloudera/tmp/ > > > > I tried to give absolute path but then get an error > > > > scala> r.saveAsTextFile("/home/cloudera/tmp/out1") > > org.apache.hadoop.security.AccessControlException: Permission denied: > user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x > > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch > eckFsPermission(DefaultAuthorizationProvider.java:257) > > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch > eck(DefaultAuthorizationProvider.java:238) > > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch > eck(DefaultAuthorizationProvider.java:216) > > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch > eckPermission(DefaultAuthorizationProvider.java:145) > > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermis > sion(FSPermissionChecker.java:138) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS > Namesystem.java:6286) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS > Namesystem.java:6268) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAcces > s(FSNamesystem.java:6220) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSN > amesystem.java:4087) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesy > stem.java:4057) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesyste > m.java:4030) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNo > deRpcServer.java:787) > > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClien > tProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran > slatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli > entNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos. > java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call > (ProtobufRpcEngine.java:587) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat > ion.java:1642) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > > > Very frustrated. Please advise. > > > > > > Regards, > > > > Ningjun Wang > > Consulting Software Engineer > > LexisNexis > > 121 Chanlon Road > > New Providence, NJ 07974-1541 > > > > From: Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden > email]] > Sent: Monday, January 12, 2015 4:18 AM > > > To: Wang, Ningjun (LNG-NPV) > Subject: Re: Failed to save RDD as text file to local file system > > > > Have you tried simple giving the path where you want to save the file ? > > > > For instance in your case just do > > > > r.saveAsTextFile("home/cloudera/tmp/out1") > > > > Dont use file > > > > This will create a folder with name out1. saveAsTextFile always write > by making a directory, it does not write data into a single file. > > > > Incase you need a single file you can use copyMerge API in FileUtils. > > > > FileUtil.copyMerge(fs, home/cloudera/tmp/out1, > fs,home/cloudera/tmp/out2 , true, conf,null); > > Now out2 will be a single file containing your data. > > fs is the configuration of you local file system. > > Thanks > > > > > > On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User > List] <[hidden email]> wrote: > > No, do you have any idea? > > > > Regards, > > > > Ningjun Wang > > Consulting Software Engineer > > LexisNexis > > 121 Chanlon Road > > New Providence, NJ 07974-1541 > > > > From: firemonk9 [via Apache Spark User List] [mailto:[hidden > email][hidden email]] > Sent: Friday, January 09, 2015 2:56 PM > To: Wang, Ningjun (LNG-NPV) > Subject: Re: Failed to save RDD as text file to local file system > > > > Have you found any resolution for this issue ? > > ________________________________ > > If you reply to this email, your message will be added to the > discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD > -as-text-file-to-local-file-system-tp21050p21067.html > > To unsubscribe from Failed to save RDD as text file to local file > system, click here. > NAML > > > > ________________________________ > > If you reply to this email, your message will be added to the > discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD > -as-text-file-to-local-file-system-tp21050p21068.html > > To start a new topic under Apache Spark User List, email [hidden > email] To unsubscribe from Apache Spark User List, click here. > NAML > > > > > > ________________________________ > > If you reply to this email, your message will be added to the > discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD > -as-text-file-to-local-file-system-tp21050p21093.html > > To unsubscribe from Failed to save RDD as text file to local file > system, click here. > NAML > > > ________________________________ > View this message in context: RE: Failed to save RDD as text file to > local file system > > Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org