Re: Append to Parquet
On 1 Dec 2017, at 3:44, VinShar wrote: yes this was my understanding also but then i found that Spark's DataFrame does has a method which appends to Parquet ( df.write.parquet(destName, mode="append")). below is an article that throws some light on this. i was wondering if there is a way to achieve the same through NiFi. http://aseigneurin.github.io/2017/03/14/incrementally-loaded-parquet-files.html You should not believe all that bloggers write :) In the blog they are writing to the `permit-inspections.parquet` **folder**. It’s not a parquet file. The parquet files are contained in the folder. The append mode you are referring to simply writes new parquet files in the folder, without touching the existing ones. If they would have used the `overwrite` option, then the existing folder would have been emptied before. Cheers, Giovanni
Re: Append to Parquet
Thanks for the link. Currently we don't have a way to do something like that, but if we could figure out how that data frame append code works behind the scenes, then we could potentially offer something similar. On Thu, Nov 30, 2017 at 9:44 PM, VinShar wrote: > yes this was my understanding also but then i found that Spark's DataFrame > does has a method which appends to Parquet ( df.write.parquet(destName, > mode="append")). below is an article that throws some light on this. i was > wondering if there is a way to achieve the same through NiFi. > > http://aseigneurin.github.io/2017/03/14/incrementally-loaded-parquet-files.html > > I have a workaround in mind for this where i can save data i want to append > to parque in a file (say in avro format) and then execute a script through > ExecuteProcess to launch a spark job to read avro and append to an existing > Parquet file and then delete avro. I am looking for a simpler way than this. > > > > -- > Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Append to Parquet
yes this was my understanding also but then i found that Spark's DataFrame does has a method which appends to Parquet ( df.write.parquet(destName, mode="append")). below is an article that throws some light on this. i was wondering if there is a way to achieve the same through NiFi. http://aseigneurin.github.io/2017/03/14/incrementally-loaded-parquet-files.html I have a workaround in mind for this where i can save data i want to append to parque in a file (say in avro format) and then execute a script through ExecuteProcess to launch a spark job to read avro and append to an existing Parquet file and then delete avro. I am looking for a simpler way than this. -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Append to Parquet
Hello, As far as I know there is not an option in Parquet to append due to the way it's internal format works. The ParquetFileWriter has a mode which only has CREATE and OVERWRITE: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java#L105-L107 -Bryan On Thu, Nov 30, 2017 at 5:12 PM, VinShar wrote: > Hi, > > Is there any way to use PutParquet to append to an existing parquet file? i > know that i can create a Kite DataSet and write parques to it but i am > looking for an alternate to Spark's DataFrame.write.parquet (destination, > mode="overwrite") > > Regards, > Vinay > > > > -- > Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Append to Parquet
Hi, Is there any way to use PutParquet to append to an existing parquet file? i know that i can create a Kite DataSet and write parques to it but i am looking for an alternate to Spark's DataFrame.write.parquet (destination, mode="overwrite") Regards, Vinay -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/