fuwhu opened a new pull request #27148: [SPARK-30469] Partition columns should 
not be involved when calculating sizeInBytes of Project logical plan
URL: https://github.com/apache/spark/pull/27148
 
 
   ### What changes were proposed in this pull request?
    SizeInBytesOnlyStatsPlanVisitor.visitProject exclude partition columns when 
calculating sizeInBytes.
   
   ### Why are the changes needed?
   When getting the statistics of a Project logical plan, if CBO not enabled, 
Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the 
size in bytes, which will compute the ratio of the row size of the project plan 
and its child plan.
   And the row size is computed based on the output attributes (columns). 
Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition 
columns of hive table as well, which is not reasonable, because partition 
columns actually does not account for sizeInBytes.
   This may make the sizeInBytes not accurate. This PR update to exclude 
partition columns in SizeInBytesOnlyStatsPlanVisitor.visitProject
   
   ### Does this PR introduce any user-facing change?
   no
   
   ### How was this patch tested?
   Existing unit test
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to