singhpk234 commented on PR #37083:
URL: https://github.com/apache/spark/pull/37083#issuecomment-1174817038

   > Could you enable spark.sql.cbo.enabled to estimate row count?
   
   Thanks @wangyum, I am aware of the alternate visitor we use with cbo. 
   
   I raised this pr considering : 
   1. cbo is turned off by default.
   2. We already have rowCount propagated via LeafNodes (DSv2Relation) which 
are used for estimating output size in SizeInBytesOnlyStatsPlanVisitor
   
https://github.com/apache/spark/blob/161c596cafea9c235b5c918d8999c085401d73a9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L93-L100
   3. ANALYZE is not supported for v2 tables so except row count, IMHO we can't 
have ndv etc. I am refering to this jira : 
https://issues.apache.org/jira/browse/SPARK-39420
   4. As per my understanding v1 tables can only pass in sizeInBytes unless 
they have some stats in catalog. whereas v2 tables already give both from the 
relation itself, hence I thought it's un-accounted for v2 tables.
   
https://github.com/apache/spark/blob/161c596cafea9c235b5c918d8999c085401d73a9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala#L43-L45
   
   
   Are you recommending it's an expected behavior / by design ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to