viirya commented on a change in pull request #28686:
URL: https://github.com/apache/spark/pull/28686#discussion_r434808554
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala
##########
@@ -82,8 +82,8 @@ class HiveSessionStateBuilder(session: SparkSession,
parentState: Option[Session
override val postHocResolutionRules: Seq[Rule[LogicalPlan]] =
new DetectAmbiguousSelfJoin(conf) +:
- new DetermineTableStats(session) +:
RelationConversions(conf, catalog) +:
+ new DetermineTableStats(session) +:
Review comment:
`DetermineTableStats` updates statistics in `HiveTableRelation` for some
cases. Updated statistics will be propagated into `HadoopFsRelation` and so the
`sizeInBytes` depends on it.
As you change the order of `DetermineTableStats` to after
`RelationConversions`, it could change the `sizeInBytes` of converted
`HadoopFsRelation`.
That said, if the user is willing to use `fallBackToHdfsForStatsEnabled` to
calculate table size, this change will make it not work as before.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]