Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17630
  
    > are we storing UTF8Strings directly in the catalog for statistics? That 
doesn't make sense ... if we are not, then we are not using internal types.
    
    @rxin By "in the catalog for statistics", do you mean statistics in 
metastore? We still use external type for statistics in the metastore. What 
this pr changed were the types of min/max in `ColumnStat`. So we don't have 
this problem here.
    
    > My concern is that the internal types are specific to the physical 
execution path and stats/CBO are independent of that. We can in the future 
change the internal data types without changing CBO.
    
    Since literal values are internal, stats/CBO need to be consistent with 
them to do estimation. So it's hard for CBO to be independent of that. If the 
internal types are changed in the future, what we can do is to change the 
conversion contract defined in `ColumnStat` based on the changes on internal 
types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to