cloud-fan commented on code in PR #45039:
URL: https://github.com/apache/spark/pull/45039#discussion_r1857542583
##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##########
@@ -1056,11 +1056,22 @@ private[hive] object HiveClientImpl extends Logging {
/** Get the Spark SQL native DataType from Hive's FieldSchema. */
private def getSparkSQLDataType(hc: FieldSchema): DataType = {
Review Comment:
This is also called by `verifyColumnDataType`, which is called by
`createTable`, `alterTable`, etc. of `HiveClientImpl`.
This means an unintentional behavior change: now we can save some spark data
source tables as hive-compatible tables. This seems good, but it causes
troubles if a user creates a table using Spark 4.0, and tries to read it with
older Spark versions.
It's common for users to upgrade the Spark version for a few workloads
first, then expand it, instead of doing the upgrade all at once. Can we add a
flag for this function to disable this fix, and set the flag to false when
calling the function in `verifyColumnDataType`? Then we don't change the way we
create tables in Spark 4.0
##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##########
@@ -1056,11 +1056,22 @@ private[hive] object HiveClientImpl extends Logging {
/** Get the Spark SQL native DataType from Hive's FieldSchema. */
private def getSparkSQLDataType(hc: FieldSchema): DataType = {
Review Comment:
cc @yaooqinn @dongjoon-hyun
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]