attilapiros commented on a change in pull request #26016: [SPARK-24914][SQL]
New statistic to improve data size estimate for columnar storage formats
URL: https://github.com/apache/spark/pull/26016#discussion_r354938937
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1331,6 +1331,29 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val DESERIALIZATION_FACTOR_CALC_ENABLED =
+ buildConf("spark.sql.statistics.deserFactor.calc.enabled")
+ .doc("Enables the calculation of the deserialization factor as a table
statistic. " +
+ "This factor is intended to be calculated for columnar storage formats
as a ratio of " +
+ "actual data size to raw file size but currently Spark calculates this
only for the ORC " +
+ "format. Spark uses this ratio is to scale up the estimated size,
which leads to " +
+ "better estimate of in-memory data size and improves the query
optimization (i.e., join " +
+ "strategy). In case of partitioned table the maximum of these factors
is taken. " +
+ "Spark stores this factor in the meta store and reuses it so the table
" +
+ "can grow without having to recompute this statistic. " +
+ "The stored factor can be removed only by a TRUNCATE or a DROP table
so even a " +
+ "subsequent ANALYZE TABLE where the calculation is disabled keeps the
old value.")
+ .booleanConf
+ .createWithDefault(false)
+
+ val DESERIALIZATION_FACTOR_EXTRA_DISTORTION =
+ buildConf("spark.sql.statistics.deserFactor.distortion")
Review comment:
Resolved with 3ee9be61b504887972a8cc09ca7ce7419e196fa2
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]