yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r425888062
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -281,11 +284,27 @@ case class InsertIntoHiveTable(
oldPart.flatMap(_.storage.locationUri.map(uri => new Path(uri)))
}
+ val hiveVersion =
externalCatalog.asInstanceOf[ExternalCatalogWithListener]
+ .unwrapped.asInstanceOf[HiveExternalCatalog]
+ .client
+ .version
+ // https://issues.apache.org/jira/browse/SPARK-31684,
+ // For Hive 2.0.0 and onwards, as
https://issues.apache.org/jira/browse/HIVE-11940
+ // has been fixed, and there is no performance issue anymore.
Review comment:
I have just added a benchmark for InsertIntoHiveTable and updated the
results.
It uses the "INSERT INTO" as the control group, and "INSERT OVERWRITE" as
the experimental group.
With builtin hive 2.3.7, the results of 2 groups are close.
WIth builtin hive 1.2.1.xxx, the results of the experimental group reveal a
huge performance degradation when the dynamic partition column exists (the
reason here is that #15726 is not merged.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]