attilapiros commented on issue #26016: [SPARK-24914][SQL] New statistic to 
improve data size estimate for columnar storage formats
URL: https://github.com/apache/spark/pull/26016#issuecomment-565411662
 
 
   @maropu the `deserFactor` has some advantages:
   
   - It is not needed to always keep it up-to-date. A factor is more general 
than a number and when new partitions are added with similar distribution (or 
even with a different one but with worse compression) the factor can be safely 
reused. This way executing the ANALYZE TABLE is needed less frequently.
   
   -  Setting it manually the user can switch off broadcast joins table by 
table (the opposite of the broadcast hint + having it configurable per table 
not for each table name occurrence within each queries where it is used).
   
   - Regarding ORC we have direct support for this factor. So calculating is 
expected to be very efficient.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to