Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22707#discussion_r225759293
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -227,18 +227,22 @@ case class InsertIntoHiveTable(
// Newer Hive largely improves insert overwrite performance. As
Spark uses older Hive
// version and we may not want to catch up new Hive version
every time. We delete the
// Hive partition first and then load data file into the Hive
partition.
- if (oldPart.nonEmpty && overwrite) {
- oldPart.get.storage.locationUri.foreach { uri =>
- val partitionPath = new Path(uri)
- val fs = partitionPath.getFileSystem(hadoopConf)
- if (fs.exists(partitionPath)) {
- if (!fs.delete(partitionPath, true)) {
- throw new RuntimeException(
- "Cannot remove partition directory '" +
partitionPath.toString)
- }
- // Don't let Hive do overwrite operation since it is
slower.
- doHiveOverwrite = false
+ if (overwrite) {
+ val oldPartitionPath =
oldPart.flatMap(_.storage.locationUri.map(new Path(_)))
+ .getOrElse {
+ ExternalCatalogUtils.generatePartitionPath(
+ partitionSpec,
+ partitionColumnNames,
+ HiveClientImpl.toHiveTable(table).getDataLocation)
--- End diff --
Looks correct as I saw we assign `CatalogTable.storage.locationUr` to
HiveTable's data location.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]