[GitHub] [spark] attilapiros commented on a change in pull request #26016: [SPARK-24914][SQL] New statistic to improve data size estimate for columnar storage formats

GitBox Thu, 24 Oct 2019 01:56:34 -0700

attilapiros commented on a change in pull request #26016: [SPARK-24914][SQL] 
New statistic to improve data size estimate for columnar storage formats
URL: https://github.com/apache/spark/pull/26016#discussion_r338456413


 ##########
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
 ##########
 @@ -1330,6 +1334,7 @@ object HiveExternalCatalog {
 
   val STATISTICS_PREFIX = SPARK_SQL_PREFIX + "statistics."
   val STATISTICS_TOTAL_SIZE = STATISTICS_PREFIX + "totalSize"
+  val STATISTICS_DESER_FACTOR = STATISTICS_PREFIX + "deserFactor"
 
 Review comment:
   Yes, I thought about making this `property` writeable. Even already checked 
how to make this one statistic writeable (along keeping the `spark.sql.`prefix) 
but finally I threw away as it was a bit hacky (although it is just a few 
lines) on the other hand using another regular property (without the 
`spark.sql.` prefix) in the middle of the statistics calculation and its saving 
into the meta store also won't be so straightforward. But I will check that 
again.
   
   I think it is unnecessary to make the property name able to be chosen by the 
user via a new config. It is enough to have fix property name (and make it 
documented when it is writable directly).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] attilapiros commented on a change in pull request #26016: [SPARK-24914][SQL] New statistic to improve data size estimate for columnar storage formats

Reply via email to