Adding this simple setting helped me overcome the issue -
*spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
*
My Issue -
In a S3 Folder, I previously had data partitionedBy - *ingestiontime* .
Now I wanted to reprocess this data and partition it by -
businessname &
Hi Yash,
Yes, AFAIK, that is the expected behavior of the Overwrite mode.
I think you can use the following approaches if you want to perform a job
on each partitions
[1] for each partition in DF :
Hi All,
While writing a partitioned data frame as partitioned text files I see that
Spark deletes all available partitions while writing few new partitions.
dataDF.write.partitionBy(“year”, “month”,
> “date”).mode(SaveMode.Overwrite).text(“s3://data/test2/events/”)
Is this an expected behavior