this works for dataframes with spark 2.3 by changing a global setting, and will be configurable per write in 2.4 see: https://issues.apache.org/jira/browse/SPARK-20236 https://issues.apache.org/jira/browse/SPARK-24860
On Wed, Aug 1, 2018 at 3:11 PM, Nirav Patel <npa...@xactlycorp.com> wrote: > Hi Peay, > > Have you find better solution yet? I am having same issue. > > Following says it works with spark 2.1 onward but only when you use > sqlContext and not Dataframe > https://medium.com/@anuvrat/writing-into-dynamic-partitions-using-spark- > 2e2b818a007a > > Thanks, > Nirav > > On Mon, Oct 2, 2017 at 4:37 AM, Pavel Knoblokh <knobl...@gmail.com> wrote: > >> If your processing task inherently processes input data by month you >> may want to "manually" partition the output data by month as well as >> by day, that is to save it with a file name including the given month, >> i.e. "dataset.parquet/month=01". Then you will be able to use the >> overwrite mode with each month partition. Hope this could be of some >> help. >> >> -- >> Pavel Knoblokh >> >> On Fri, Sep 29, 2017 at 5:31 PM, peay <p...@protonmail.com> wrote: >> > Hello, >> > >> > I am trying to use >> > data_frame.write.partitionBy("day").save("dataset.parquet") to write a >> > dataset while splitting by day. >> > >> > I would like to run a Spark job to process, e.g., a month: >> > dataset.parquet/day=2017-01-01/... >> > ... >> > >> > and then run another Spark job to add another month using the same >> folder >> > structure, getting me >> > dataset.parquet/day=2017-01-01/ >> > ... >> > dataset.parquet/day=2017-02-01/ >> > ... >> > >> > However: >> > - with save mode "overwrite", when I process the second month, all of >> > dataset.parquet/ gets removed and I lose whatever was already computed >> for >> > the previous month. >> > - with save mode "append", then I can't get idempotence: if I run the >> job to >> > process a given month twice, I'll get duplicate data in all the >> subfolders >> > for that month. >> > >> > Is there a way to do "append in terms of the subfolders from >> partitionBy, >> > but overwrite within each such partitions? Any help would be >> appreciated. >> > >> > Thanks! >> >> >> >> -- >> Pavel Knoblokh >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.instagram.com/xactlycorp/> > <https://www.linkedin.com/company/xactly-corporation> > <https://twitter.com/Xactly> <https://www.facebook.com/XactlyCorp> > <http://www.youtube.com/xactlycorporation>