Re: [PR] [SPARK-52408][BUILD][SQL] Upgrade Hive to 4.1 [spark]

via GitHub Tue, 18 Nov 2025 19:53:13 -0800


vrozov commented on code in PR #52099:
URL: https://github.com/apache/spark/pull/52099#discussion_r2528923210



##########
dev/test-jars.txt:
##########
@@ -10,7 +10,6 @@ 
sql/connect/common/src/test/resources/artifact-tests/smallJar.jar
 sql/core/src/test/resources/SPARK-33084.jar
 sql/core/src/test/resources/artifact-tests/udf_noA.jar
 sql/hive-thriftserver/src/test/resources/TestUDTF.jar
-sql/hive/src/test/noclasspath/hive-test-udfs.jar

Review Comment:
   Please see https://github.com/apache/spark/pull/52099#issuecomment-3534437270



##########
repl/src/test/scala/org/apache/spark/repl/SparkShellSuite.scala:
##########
@@ -95,6 +95,25 @@ class SparkShellSuite extends SparkFunSuite {
       }
     }
 
+    def handleException(cause: Throwable): Unit = lock.synchronized {
+      val message =

Review Comment:
   Moved to #53074



##########
project/SparkBuild.scala:
##########
@@ -1750,6 +1760,7 @@ object TestSettings {
       }.getOrElse(Nil): _*),
     // Show full stack trace and duration in test cases.
     (Test / testOptions) += Tests.Argument("-oDF"),
+    (Test / testOptions) += Tests.Argument(TestFrameworks.ScalaTest, "-fG", 
"scalatest.txt"),

Review Comment:
   Moved to #53053



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala:
##########
@@ -508,7 +508,14 @@ abstract class OrcSuite
       withAllNativeOrcReaders {
         checkAnswer(
           spark.read.orc(path),
-          Seq(Row(Date.valueOf("1001-01-01")), 
Row(Date.valueOf("1582-10-15"))))
+          Seq(Row(Date.valueOf("1001-01-01")),
+            if (spark.isInstanceOf[TestSparkSession]) {
+              // Spark rebases 1582-10-05 through 1582-10-15 to 1582-10-15
+              Row(Date.valueOf("1582-10-15"))
+            } else {
+              // Hive rebases 1582-10-05 through 1582-10-15 by adding 10 days
+              Row(Date.valueOf("1582-10-20"))

Review Comment:
   The behavior for those dates is undefined. Hive 4.x handles it differently 
compared to Spark and Hive 2.3. All we can do is note the change in behavior 
and it is done on the PR description.



##########
pom.xml:
##########
@@ -336,6 +337,7 @@
       -Djdk.reflect.useDirectMethodHandle=false
       -Dio.netty.tryReflectionSetAccessible=true
       --enable-native-access=ALL-UNNAMED
+      -Dmvn.executable=${maven.multiModuleProjectDirectory}/build/mvn

Review Comment:
   ditto



##########
pom.xml:
##########
@@ -206,7 +207,7 @@
     <jsr305.version>3.0.0</jsr305.version>
     <jaxb.version>2.2.11</jaxb.version>
     <libthrift.version>0.16.0</libthrift.version>
-    <antlr4.version>4.13.1</antlr4.version>
+    <antlr4.version>4.9.3</antlr4.version>

Review Comment:
   Please see https://github.com/apache/spark/pull/52099#issuecomment-3534437270



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -4722,8 +4722,7 @@ class AstBuilder extends DataTypeAstBuilder
       entry("field.delim", ctx.fieldsTerminatedBy) ++
           entry("serialization.format", ctx.fieldsTerminatedBy) ++
           entry("escape.delim", ctx.escapedBy) ++
-          // The following typo is inherited from Hive...
-          entry("colelction.delim", ctx.collectionItemsTerminatedBy) ++
+          entry("collection.delim", ctx.collectionItemsTerminatedBy) ++

Review Comment:
   The change is necessary as the change is done on the Hive side and without 
it Hive test fail. Please see 
[HIVE-16922](https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076135)



##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala:
##########
@@ -22,6 +22,6 @@ private[client] trait HiveClientVersions {
   protected val versions = if (testVersions.nonEmpty) {
     testVersions.get.split(",").map(_.trim).filter(_.nonEmpty).toIndexedSeq
   } else {
-    IndexedSeq("2.0", "2.1", "2.2", "2.3", "3.0", "3.1", "4.0", "4.1")

Review Comment:
   Dropping support for Hive 2.x and 3.x is proposed in SPIP and on [the 
discussion 
thread](https://lists.apache.org/thread/psfyt2094xsvlp231kqyryqlk345vk3r)



##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala:
##########
@@ -854,7 +851,10 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
     // Get the original table properties as defined by the user.
     table.copy(
       createVersion = version,
-      properties = table.properties.filterNot { case (key, _) => 
key.startsWith(SPARK_SQL_PREFIX) })
+      properties = table.properties.filterNot { case (key, value) =>
+        key.startsWith(SPARK_SQL_PREFIX) ||
+        key == "bucketing_version" && value == "2" ||

Review Comment:
   Hive adds the property by default and multiple tests break if the property 
is not filtered out. 



##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala:
##########
@@ -211,8 +211,6 @@ private[client] sealed abstract class Shim {
 
   def listFunctions(hive: Hive, db: String, pattern: String): Seq[String]
 
-  def dropIndex(hive: Hive, dbName: String, tableName: String, indexName: 
String): Unit

Review Comment:
   Hive actually never used table indices and I don't see Spark relying on 
indices as well.
   
   @sarutak removing API call does not break compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52408][BUILD][SQL] Upgrade Hive to 4.1 [spark]

Reply via email to