turboFei commented on a change in pull request #25979: [SPARK-29295][SQL]
Insert overwrite to Hive external table partition should delete old data
URL: https://github.com/apache/spark/pull/25979#discussion_r329864000
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -230,12 +230,28 @@ case class InsertIntoHiveTable(
var doHiveOverwrite = overwrite
if (oldPart.isEmpty || !ifPartitionNotExists) {
+ // SPARK-29295: When insert overwrite to a Hive external table
partition, if the
+ // partition does not exist, Hive will not check if the external
partition directory
+ // exists or not before copying files. So if users drop the
partition, and then do
+ // insert overwrite to the same partition, the partition will have
both old and new
+ // data.
+ val updatedPart = if (overwrite && table.tableType ==
CatalogTableType.EXTERNAL) {
+ AlterTableAddPartitionCommand(
Review comment:
How can we rollback if the follow `loadPartition` operation fails?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]