Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17630 > are we storing UTF8Strings directly in the catalog for statistics? That doesn't make sense ... if we are not, then we are not using internal types. @rxin By "in the catalog for statistics", do you mean statistics in metastore? We still use external type for statistics in the metastore. What this pr changed were the types of min/max in `ColumnStat`. So we don't have this problem here. > My concern is that the internal types are specific to the physical execution path and stats/CBO are independent of that. We can in the future change the internal data types without changing CBO. Since literal values are internal, stats/CBO need to be consistent with them to do estimation. So it's hard for CBO to be independent of that. If the internal types are changed in the future, what we can do is to change the conversion contract defined in `ColumnStat` based on the changes on internal types.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org