vrozov commented on code in PR #52099: URL: https://github.com/apache/spark/pull/52099#discussion_r2528923210
########## dev/test-jars.txt: ########## @@ -10,7 +10,6 @@ sql/connect/common/src/test/resources/artifact-tests/smallJar.jar sql/core/src/test/resources/SPARK-33084.jar sql/core/src/test/resources/artifact-tests/udf_noA.jar sql/hive-thriftserver/src/test/resources/TestUDTF.jar -sql/hive/src/test/noclasspath/hive-test-udfs.jar Review Comment: Please see https://github.com/apache/spark/pull/52099#issuecomment-3534437270 ########## repl/src/test/scala/org/apache/spark/repl/SparkShellSuite.scala: ########## @@ -95,6 +95,25 @@ class SparkShellSuite extends SparkFunSuite { } } + def handleException(cause: Throwable): Unit = lock.synchronized { + val message = Review Comment: Moved to #53074 ########## project/SparkBuild.scala: ########## @@ -1750,6 +1760,7 @@ object TestSettings { }.getOrElse(Nil): _*), // Show full stack trace and duration in test cases. (Test / testOptions) += Tests.Argument("-oDF"), + (Test / testOptions) += Tests.Argument(TestFrameworks.ScalaTest, "-fG", "scalatest.txt"), Review Comment: Moved to #53053 ########## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala: ########## @@ -508,7 +508,14 @@ abstract class OrcSuite withAllNativeOrcReaders { checkAnswer( spark.read.orc(path), - Seq(Row(Date.valueOf("1001-01-01")), Row(Date.valueOf("1582-10-15")))) + Seq(Row(Date.valueOf("1001-01-01")), + if (spark.isInstanceOf[TestSparkSession]) { + // Spark rebases 1582-10-05 through 1582-10-15 to 1582-10-15 + Row(Date.valueOf("1582-10-15")) + } else { + // Hive rebases 1582-10-05 through 1582-10-15 by adding 10 days + Row(Date.valueOf("1582-10-20")) Review Comment: The behavior for those dates is undefined. Hive 4.x handles it differently compared to Spark and Hive 2.3. All we can do is note the change in behavior and it is done on the PR description. ########## pom.xml: ########## @@ -336,6 +337,7 @@ -Djdk.reflect.useDirectMethodHandle=false -Dio.netty.tryReflectionSetAccessible=true --enable-native-access=ALL-UNNAMED + -Dmvn.executable=${maven.multiModuleProjectDirectory}/build/mvn Review Comment: ditto ########## pom.xml: ########## @@ -206,7 +207,7 @@ <jsr305.version>3.0.0</jsr305.version> <jaxb.version>2.2.11</jaxb.version> <libthrift.version>0.16.0</libthrift.version> - <antlr4.version>4.13.1</antlr4.version> + <antlr4.version>4.9.3</antlr4.version> Review Comment: Please see https://github.com/apache/spark/pull/52099#issuecomment-3534437270 ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ########## @@ -4722,8 +4722,7 @@ class AstBuilder extends DataTypeAstBuilder entry("field.delim", ctx.fieldsTerminatedBy) ++ entry("serialization.format", ctx.fieldsTerminatedBy) ++ entry("escape.delim", ctx.escapedBy) ++ - // The following typo is inherited from Hive... - entry("colelction.delim", ctx.collectionItemsTerminatedBy) ++ + entry("collection.delim", ctx.collectionItemsTerminatedBy) ++ Review Comment: The change is necessary as the change is done on the Hive side and without it Hive test fail. Please see [HIVE-16922](https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076135) ########## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala: ########## @@ -22,6 +22,6 @@ private[client] trait HiveClientVersions { protected val versions = if (testVersions.nonEmpty) { testVersions.get.split(",").map(_.trim).filter(_.nonEmpty).toIndexedSeq } else { - IndexedSeq("2.0", "2.1", "2.2", "2.3", "3.0", "3.1", "4.0", "4.1") Review Comment: Dropping support for Hive 2.x and 3.x is proposed in SPIP and on [the discussion thread](https://lists.apache.org/thread/psfyt2094xsvlp231kqyryqlk345vk3r) ########## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ########## @@ -854,7 +851,10 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat // Get the original table properties as defined by the user. table.copy( createVersion = version, - properties = table.properties.filterNot { case (key, _) => key.startsWith(SPARK_SQL_PREFIX) }) + properties = table.properties.filterNot { case (key, value) => + key.startsWith(SPARK_SQL_PREFIX) || + key == "bucketing_version" && value == "2" || Review Comment: Hive adds the property by default and multiple tests break if the property is not filtered out. ########## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ########## @@ -211,8 +211,6 @@ private[client] sealed abstract class Shim { def listFunctions(hive: Hive, db: String, pattern: String): Seq[String] - def dropIndex(hive: Hive, dbName: String, tableName: String, indexName: String): Unit Review Comment: Hive actually never used table indices and I don't see Spark relying on indices as well. @sarutak removing API call does not break compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
