bersprockets commented on a change in pull request #26016: [SPARK-24914][SQL]
New statistic to improve data size estimate for columnar storage formats
URL: https://github.com/apache/spark/pull/26016#discussion_r332277776
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -1330,6 +1334,7 @@ object HiveExternalCatalog {
val STATISTICS_PREFIX = SPARK_SQL_PREFIX + "statistics."
val STATISTICS_TOTAL_SIZE = STATISTICS_PREFIX + "totalSize"
+ val STATISTICS_DESER_FACTOR = STATISTICS_PREFIX + "deserFactor"
Review comment:
Since the initial implementation supports only Orc, and even non-columnar
file types can be compressed, it might make sense to allow the user to manually
add this property via `alter table set properties`. However, as long as this
property starts with "spark.sql.", the user cannot set the property.
Conversely, the user might want to remove the property without having to
drop or truncate the table. Again, as long as this property starts with
"spark.sql.", the user cannot unset the property.
Of course, the user can do this from Hive itself. But maybe he/she doesn't
have access to Hive.
I guess I am saying it might make sense to make the property settable (and
maybe let the user choose the property name via config).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]