Adding this simple setting helped me overcome the issue - *spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic") * My Issue -
In a S3 Folder, I previously had data partitionedBy - *ingestiontime* . Now I wanted to reprocess this data and partition it by - businessname & ingestiontime. Whenever I was writing my dataframe in OverWrite Mode, All my data, which was present prior to this operation were TRUNCATED/DELETED. After setting the above spark configuration, Only the required Partitions are being truncated and overwritten and all others stay Intact. In addition to this, if you have hadoop Trash Enabled, then you might be able to fetch this lost data back. For more - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#File_Deletes_and_Undeletes -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org