cloud-fan commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426618965
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -281,11 +283,26 @@ case class InsertIntoHiveTable(
oldPart.flatMap(_.storage.locationUri.map(uri => new Path(uri)))
}
- // SPARK-18107: Insert overwrite runs much slower than hive-client.
+ val hiveVersion =
externalCatalog.asInstanceOf[ExternalCatalogWithListener]
+ .unwrapped.asInstanceOf[HiveExternalCatalog]
+ .client
+ .version
+ // SPARK-31684:
+ // For Hive 2.0.0 and onwards, as
https://issues.apache.org/jira/browse/HIVE-11940
+ // has been fixed, and there is no performance issue anymore. We
should leave the
+ // overwrite logic to hive to avoid failure in
`FileSystem#checkPath` when the table
+ // and partition locations do not belong to the same `FileSystem`
+ // TODO(SPARK-31675): For Hive 2.2.0 and earlier, if the table and
partition locations
+ // do not belong together, we will still get the same error thrown
by hive encryption
+ // check. see https://issues.apache.org/jira/browse/HIVE-14380.
+ // So we still disable for Hive overwrite for Hive 1.x for better
performance because
+ // the partition and table are on the same cluster in most cases.
+ // SPARK-18107:
+ // Insert overwrite runs much slower than hive-client.
Review comment:
nit: this should be put in the same line of `SPARK-18107:`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]