Are you able to manually delete the folder below? I'm wondering if there is some sort of non-Spark factor involved (permissions, etc).
/nfspartition/sankar/banking_l1_v2.csv On Tue, Sep 20, 2016 at 12:19 PM, Sankar Mittapally < sankar.mittapa...@creditvidya.com> wrote: > I used that one also > > On Sep 20, 2016 10:44 PM, "Kevin Mellott" <kevin.r.mell...@gmail.com> > wrote: > >> Instead of *mode="append"*, try *mode="overwrite"* >> >> On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally < >> sankar.mittapa...@creditvidya.com> wrote: >> >>> Please find the code below. >>> >>> sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json") >>> >>> I tried these two commands. >>> write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv", >>> header="true") >>> >>> saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true") >>> >>> >>> >>> On Tue, Sep 20, 2016 at 9:40 PM, Kevin Mellott < >>> kevin.r.mell...@gmail.com> wrote: >>> >>>> Can you please post the line of code that is doing the df.write command? >>>> >>>> On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally < >>>> sankar.mittapa...@creditvidya.com> wrote: >>>> >>>>> Hey Kevin, >>>>> >>>>> It is a empty directory, It is able to write part files to the >>>>> directory but while merging those part files we are getting above error. >>>>> >>>>> Regards >>>>> >>>>> >>>>> On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott < >>>>> kevin.r.mell...@gmail.com> wrote: >>>>> >>>>>> Have you checked to see if any files already exist at >>>>>> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to >>>>>> delete them before attempting to save your DataFrame to that location. >>>>>> Alternatively, you may be able to specify the "mode" setting of the >>>>>> df.write operation to "overwrite", depending on the version of Spark you >>>>>> are running. >>>>>> >>>>>> *ERROR (from log)* >>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e >>>>>> -9110-510978eaaecb.csv.crc]: >>>>>> it still exists. >>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e- >>>>>> 9110-510978eaaecb.csv]: >>>>>> it still exists. >>>>>> >>>>>> *df.write Documentation* >>>>>> http://spark.apache.org/docs/latest/api/R/write.df.html >>>>>> >>>>>> Thanks, >>>>>> Kevin >>>>>> >>>>>> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally < >>>>>> sankar.mittapa...@creditvidya.com> wrote: >>>>>> >>>>>>> We have setup a spark cluster which is on NFS shared storage, there >>>>>>> is no >>>>>>> permission issues with NFS storage, all the users are able to write >>>>>>> to NFS >>>>>>> storage. When I fired write.df command in SparkR, I am getting >>>>>>> below. Can >>>>>>> some one please help me to fix this issue. >>>>>>> >>>>>>> >>>>>>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting >>>>>>> job. >>>>>>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus >>>>>>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary >>>>>>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-249 >>>>>>> 0-444e-9110-510978eaaecb.csv; >>>>>>> isDirectory=false; length=436486316; replication=1; >>>>>>> blocksize=33554432; >>>>>>> modification_time=1474099400000; access_time=0; owner=; group=; >>>>>>> permission=rw-rw-rw-; isSymlink=false} >>>>>>> to >>>>>>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a >>>>>>> 7f178-2490-444e-9110-510978eaaecb.csv >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m >>>>>>> ergePaths(FileOutputCommitter.java:371) >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m >>>>>>> ergePaths(FileOutputCommitter.java:384) >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.c >>>>>>> ommitJob(FileOutputCommitter.java:326) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.BaseWriterContain >>>>>>> er.commitJob(WriterContainer.scala:222) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo >>>>>>> pFsRelationCommand.scala:144) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>>>> tionCommand.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>>>> tionCommand.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >>>>>>> nId(SQLExecution.scala:57) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>>>> ideEffectResult$lzycompute(commands.scala:60) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>>>> ideEffectResult(commands.scala:58) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.d >>>>>>> oExecute(commands.scala:74) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1. >>>>>>> apply(SparkPlan.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1. >>>>>>> apply(SparkPlan.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue >>>>>>> ry$1.apply(SparkPlan.scala:136) >>>>>>> at >>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>>>>> onScope.scala:151) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkP >>>>>>> lan.scala:133) >>>>>>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.s >>>>>>> cala:114) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompu >>>>>>> te(QueryExecution.scala:86) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExe >>>>>>> cution.scala:86) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.DataSource.write( >>>>>>> DataSource.scala:487) >>>>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc >>>>>>> ala:211) >>>>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc >>>>>>> ala:194) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>>>>> ssorImpl.java:62) >>>>>>> at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>>>>> thodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>> at >>>>>>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac >>>>>>> kendHandler.scala:141) >>>>>>> at >>>>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend >>>>>>> Handler.scala:86) >>>>>>> at >>>>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend >>>>>>> Handler.scala:38) >>>>>>> at >>>>>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim >>>>>>> pleChannelInboundHandler.java:105) >>>>>>> at >>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel >>>>>>> Read(AbstractChannelHandlerContext.java:308) >>>>>>> at >>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe >>>>>>> ad(AbstractChannelHandlerContext.java:294) >>>>>>> at >>>>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M >>>>>>> essageToMessageDecoder.java:103) >>>>>>> at >>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel >>>>>>> Read(AbstractChannelHandlerContext.java:308) >>>>>>> at >>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe >>>>>>> ad(AbstractChannelHandlerContext.java:294) >>>>>>> at >>>>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte >>>>>>> ToMessageDecoder.java:244) >>>>>>> at >>>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel >>>>>>> Read(AbstractChannelHandlerContext.java:308) >>>>>>> at >>>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe >>>>>>> ad(AbstractChannelHandlerContext.java:294) >>>>>>> at >>>>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa >>>>>>> ultChannelPipeline.java:846) >>>>>>> at >>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re >>>>>>> ad(AbstractNioByteChannel.java:131) >>>>>>> at >>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven >>>>>>> tLoop.java:511) >>>>>>> at >>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz >>>>>>> ed(NioEventLoop.java:468) >>>>>>> at >>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve >>>>>>> ntLoop.java:382) >>>>>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) >>>>>>> at >>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin >>>>>>> gleThreadEventExecutor.java:111) >>>>>>> at >>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl >>>>>>> eDecorator.run(DefaultThreadFactory.java:137) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e >>>>>>> -9110-510978eaaecb.csv.crc]: >>>>>>> it still exists. >>>>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e- >>>>>>> 9110-510978eaaecb.csv]: >>>>>>> it still exists. >>>>>>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job >>>>>>> job_201609170803_0000 >>>>>>> aborted. >>>>>>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed >>>>>>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : >>>>>>> org.apache.spark.SparkException: Job aborted. >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo >>>>>>> pFsRelationCommand.scala:149) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>>>> tionCommand.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>>>> tionCommand.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >>>>>>> nId(SQLExecution.scala:57) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>>>> ideEffectResult$lzycompute(commands.scala:60) >>>>>>> at >>>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>>>> ideEffectResult(commands.scala:58) >>>>>>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: http://apache-spark-user-list. >>>>>>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-t >>>>>>> p27761.html >>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>> Nabble.com. >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> >>>>> Sankar Mittapally >>>>> Senior Software Engineer >>>>> >>>> >>>> >>> >>> >>> -- >>> Regards >>> >>> Sankar Mittapally >>> Senior Software Engineer >>> >> >>