Re: Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

golokeshpatra.patra Wed, 19 Aug 2020 01:23:01 -0700

Adding this simple setting helped me overcome the issue - 

*spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
*
My Issue -


In a S3 Folder, I previously had data partitionedBy - *ingestiontime* .
Now I wanted to reprocess this data and partition it by - 
businessname & ingestiontime.

Whenever I was writing my dataframe in OverWrite Mode, 
All my data, which was present prior to this operation were
TRUNCATED/DELETED.

After setting the above spark configuration,
Only the required Partitions are being truncated and overwritten and all
others stay Intact.

In addition to this, if you have hadoop Trash Enabled, then you might be
able to fetch this lost data back.
For more -
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#File_Deletes_and_Undeletes



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

Reply via email to