I want to save to local directory. I have tried the following and get error

r.saveAsTextFile("file:/home/cloudera/tmp/out1")
r.saveAsTextFile("file:///home/cloudera/tmp/out1")
r.saveAsTextFile("file:////home/cloudera/tmp/out1")

They all generate the following error
15/01/12 08:31:10 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5, 
master01.cloudera): java.io.IOException: Mkdirs failed to create 
file:/home/cloudera/temp/out1/_temporary/0/_temporary/attempt_201501120831_0001_m_000001_5
 (exists=false, cwd=file:/var/run/spark/work/app-20150112080951-0002/0)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
        at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
        at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
        at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056)
        at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


The id that run spark and driver program have full permission to the dir 
/home/cloudera/tmp/. I can cd there and run "mkdir out1" to create the dir 
without problem. I then remove the dir "out1" and then run
r.saveAsTextFile("file:/home/cloudera/tmp/out1")

I got the error above but the dir "out1" is created. Look like 
r.saveAsTextFile(...) try to create sub dirs 
out1/_temporary/0/_temporary/attempt_201501120831_0001_m_000001_5  which failed.

Has anybody successfully run  r.saveAsTextFile(...) to save RDD to local file 
system on Linux?

Ningjun


-----Original Message-----
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Monday, January 12, 2015 11:25 AM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: Failed to save RDD as text file to local file system

I think you're confusing HDFS paths and local paths. You are cd'ing to a 
directory and seem to want to write output there, but your path has no scheme 
and defaults to being an HDFS path. When you use "file:" you seem to have a 
permission error (perhaps).

On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang <ningjun.w...@lexisnexis.com> 
wrote:
> Prannoy
>
>
>
> I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return 
> without error. But where does it saved to? The folder 
> “/home/cloudera/tmp/out1” is not cretaed.
>
>
>
> I also tried the following
>
> cd /home/cloudera/tmp/
>
> spark-shell
>
> scala> val r = sc.parallelize(Array("a", "b", "c"))
>
> scala> r.saveAsTextFile("out1")
>
>
>
> It does not return error. But still there is no “out1” folder created 
> under /home/cloudera/tmp/
>
>
>
> I tried to give absolute path but then get an error
>
>
>
> scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
>
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>
>         at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eckFsPermission(DefaultAuthorizationProvider.java:257)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eck(DefaultAuthorizationProvider.java:238)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eck(DefaultAuthorizationProvider.java:216)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eckPermission(DefaultAuthorizationProvider.java:145)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermis
> sion(FSPermissionChecker.java:138)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS
> Namesystem.java:6286)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS
> Namesystem.java:6268)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAcces
> s(FSNamesystem.java:6220)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSN
> amesystem.java:4087)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesy
> stem.java:4057)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesyste
> m.java:4030)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNo
> deRpcServer.java:787)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClien
> tProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran
> slatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli
> entNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.
> java)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call
> (ProtobufRpcEngine.java:587)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>
>         at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>
>         at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> ion.java:1642)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>
>
> Very frustrated. Please advise.
>
>
>
>
>
> Regards,
>
>
>
> Ningjun Wang
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> From: Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden 
> email]]
> Sent: Monday, January 12, 2015 4:18 AM
>
>
> To: Wang, Ningjun (LNG-NPV)
> Subject: Re: Failed to save RDD as text file to local file system
>
>
>
> Have you tried simple giving the path where you want to save the file ?
>
>
>
> For instance in your case just do
>
>
>
> r.saveAsTextFile("home/cloudera/tmp/out1")
>
>
>
> Dont use file
>
>
>
> This will create a folder with name out1. saveAsTextFile always write 
> by making a directory, it does not write data into a single file.
>
>
>
> Incase you need a single file you can use copyMerge API in FileUtils.
>
>
>
> FileUtil.copyMerge(fs, home/cloudera/tmp/out1, 
> fs,home/cloudera/tmp/out2 , true, conf,null);
>
> Now out2 will be a single file containing your data.
>
> fs is the configuration of you local file system.
>
> Thanks
>
>
>
>
>
> On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User 
> List] <[hidden email]> wrote:
>
> No, do you have any idea?
>
>
>
> Regards,
>
>
>
> Ningjun Wang
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> From: firemonk9 [via Apache Spark User List] [mailto:[hidden 
> email][hidden email]]
> Sent: Friday, January 09, 2015 2:56 PM
> To: Wang, Ningjun (LNG-NPV)
> Subject: Re: Failed to save RDD as text file to local file system
>
>
>
> Have you found any resolution for this issue ?
>
> ________________________________
>
> If you reply to this email, your message will be added to the 
> discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD
> -as-text-file-to-local-file-system-tp21050p21067.html
>
> To unsubscribe from Failed to save RDD as text file to local file 
> system, click here.
> NAML
>
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the 
> discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD
> -as-text-file-to-local-file-system-tp21050p21068.html
>
> To start a new topic under Apache Spark User List, email [hidden 
> email] To unsubscribe from Apache Spark User List, click here.
> NAML
>
>
>
>
>
> ________________________________
>
> If you reply to this email, your message will be added to the 
> discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD
> -as-text-file-to-local-file-system-tp21050p21093.html
>
> To unsubscribe from Failed to save RDD as text file to local file 
> system, click here.
> NAML
>
>
> ________________________________
> View this message in context: RE: Failed to save RDD as text file to 
> local file system
>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to