Instead of *mode="append"*, try *mode="overwrite"* On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally < sankar.mittapa...@creditvidya.com> wrote:
> Please find the code below. > > sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json") > > I tried these two commands. > write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv",header="true") > > saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true") > > > > On Tue, Sep 20, 2016 at 9:40 PM, Kevin Mellott <kevin.r.mell...@gmail.com> > wrote: > >> Can you please post the line of code that is doing the df.write command? >> >> On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally < >> sankar.mittapa...@creditvidya.com> wrote: >> >>> Hey Kevin, >>> >>> It is a empty directory, It is able to write part files to the directory >>> but while merging those part files we are getting above error. >>> >>> Regards >>> >>> >>> On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott < >>> kevin.r.mell...@gmail.com> wrote: >>> >>>> Have you checked to see if any files already exist at >>>> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to delete >>>> them before attempting to save your DataFrame to that location. >>>> Alternatively, you may be able to specify the "mode" setting of the >>>> df.write operation to "overwrite", depending on the version of Spark you >>>> are running. >>>> >>>> *ERROR (from log)* >>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e >>>> -9110-510978eaaecb.csv.crc]: >>>> it still exists. >>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e- >>>> 9110-510978eaaecb.csv]: >>>> it still exists. >>>> >>>> *df.write Documentation* >>>> http://spark.apache.org/docs/latest/api/R/write.df.html >>>> >>>> Thanks, >>>> Kevin >>>> >>>> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally < >>>> sankar.mittapa...@creditvidya.com> wrote: >>>> >>>>> We have setup a spark cluster which is on NFS shared storage, there >>>>> is no >>>>> permission issues with NFS storage, all the users are able to write to >>>>> NFS >>>>> storage. When I fired write.df command in SparkR, I am getting below. >>>>> Can >>>>> some one please help me to fix this issue. >>>>> >>>>> >>>>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting >>>>> job. >>>>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus >>>>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary >>>>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-249 >>>>> 0-444e-9110-510978eaaecb.csv; >>>>> isDirectory=false; length=436486316; replication=1; blocksize=33554432; >>>>> modification_time=1474099400000; access_time=0; owner=; group=; >>>>> permission=rw-rw-rw-; isSymlink=false} >>>>> to >>>>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a >>>>> 7f178-2490-444e-9110-510978eaaecb.csv >>>>> at >>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m >>>>> ergePaths(FileOutputCommitter.java:371) >>>>> at >>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m >>>>> ergePaths(FileOutputCommitter.java:384) >>>>> at >>>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.c >>>>> ommitJob(FileOutputCommitter.java:326) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.BaseWriterContain >>>>> er.commitJob(WriterContainer.scala:222) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo >>>>> pFsRelationCommand.scala:144) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>> tionCommand.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>> tionCommand.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >>>>> nId(SQLExecution.scala:57) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>> ideEffectResult$lzycompute(commands.scala:60) >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>> ideEffectResult(commands.scala:58) >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.d >>>>> oExecute(commands.scala:74) >>>>> at >>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1. >>>>> apply(SparkPlan.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1. >>>>> apply(SparkPlan.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue >>>>> ry$1.apply(SparkPlan.scala:136) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>>> onScope.scala:151) >>>>> at >>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkP >>>>> lan.scala:133) >>>>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.s >>>>> cala:114) >>>>> at >>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompu >>>>> te(QueryExecution.scala:86) >>>>> at >>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExe >>>>> cution.scala:86) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.DataSource.write( >>>>> DataSource.scala:487) >>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc >>>>> ala:211) >>>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.sc >>>>> ala:194) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>>> ssorImpl.java:62) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>>> thodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>> at >>>>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac >>>>> kendHandler.scala:141) >>>>> at >>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend >>>>> Handler.scala:86) >>>>> at >>>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend >>>>> Handler.scala:38) >>>>> at >>>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim >>>>> pleChannelInboundHandler.java:105) >>>>> at >>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel >>>>> Read(AbstractChannelHandlerContext.java:308) >>>>> at >>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe >>>>> ad(AbstractChannelHandlerContext.java:294) >>>>> at >>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M >>>>> essageToMessageDecoder.java:103) >>>>> at >>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel >>>>> Read(AbstractChannelHandlerContext.java:308) >>>>> at >>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe >>>>> ad(AbstractChannelHandlerContext.java:294) >>>>> at >>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte >>>>> ToMessageDecoder.java:244) >>>>> at >>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel >>>>> Read(AbstractChannelHandlerContext.java:308) >>>>> at >>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe >>>>> ad(AbstractChannelHandlerContext.java:294) >>>>> at >>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa >>>>> ultChannelPipeline.java:846) >>>>> at >>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re >>>>> ad(AbstractNioByteChannel.java:131) >>>>> at >>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven >>>>> tLoop.java:511) >>>>> at >>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz >>>>> ed(NioEventLoop.java:468) >>>>> at >>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve >>>>> ntLoop.java:382) >>>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) >>>>> at >>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin >>>>> gleThreadEventExecutor.java:111) >>>>> at >>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl >>>>> eDecorator.run(DefaultThreadFactory.java:137) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e >>>>> -9110-510978eaaecb.csv.crc]: >>>>> it still exists. >>>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or >>>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task >>>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e- >>>>> 9110-510978eaaecb.csv]: >>>>> it still exists. >>>>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job >>>>> job_201609170803_0000 >>>>> aborted. >>>>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed >>>>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : >>>>> org.apache.spark.SparkException: Job aborted. >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo >>>>> pFsRelationCommand.scala:149) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>> tionCommand.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela >>>>> tionCommand.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >>>>> nId(SQLExecution.scala:57) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF >>>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>> ideEffectResult$lzycompute(commands.scala:60) >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s >>>>> ideEffectResult(commands.scala:58) >>>>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: http://apache-spark-user-list. >>>>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-t >>>>> p27761.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>> >>>>> >>>> >>> >>> >>> -- >>> Regards >>> >>> Sankar Mittapally >>> Senior Software Engineer >>> >> >> > > > -- > Regards > > Sankar Mittapally > Senior Software Engineer >