wangyum commented on a change in pull request #24486: [SPARK-27592][SQL] Set 
the bucketed data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#discussion_r282013417
 
 

 ##########
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
 ##########
 @@ -358,12 +358,17 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
             "Spark SQL specific format, which is NOT compatible with Hive."
         (None, message)
 
-      // our bucketing is un-compatible with hive(different hash function)
-      case _ if table.bucketSpec.nonEmpty =>
+      // our bucketing is un-compatible with hive(different hash function).
+      // but downstream(Hive/Presto) still can read it as not bucketed table.
+      // We set the SerDe correctly and bucketing_version to spark.
+      // The downstream decides how to read it by themselves, a similar 
implementation:
 
 Review comment:
   Sorry. It's not bucketed table at Hive side. Related code:
   
https://github.com/apache/spark/blob/f9776e389215255dc61efaa2eddd92a1fa754b48/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L444-L459
   
https://github.com/apache/spark/blob/33f3c48cac087e079b9c7e342c2e58b16eaaa681/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L976-L990
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to