Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/20072#discussion_r159106981
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -261,6 +261,17 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val HADOOPFSRELATION_SIZE_FACTOR = buildConf(
+ "org.apache.spark.sql.execution.datasources.sizeFactor")
+ .internal()
+ .doc("The result of multiplying this factor with the size of data
source files is propagated" +
+ " to serve as the stats to choose the best execution plan. In the
case where the " +
+ " the in-disk and in-memory size of data is significantly different,
users can adjust this" +
--- End diff --
actually
https://github.com/apache/spark/pull/20072/files/ec275a841a7bb4c23b277f915debeed54e6cf7ea#diff-9a6b543db706f1a90f790783d6930a13R250
is missing a space at the end - could you also fix that
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]