HyukjinKwon commented on a change in pull request #26200:
[SPARK-29542][SQL][DOC] Make the descriptions of spark.sql.files.* be clearly
URL: https://github.com/apache/spark/pull/26200#discussion_r337833143
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -989,25 +991,31 @@ object SQLConf {
.doc("The estimated cost to open a file, measured by the number of bytes
could be scanned in" +
" the same time. This is used when putting multiple files into a
partition. It's better to" +
" over estimated, then the partitions with small files will be faster
than partitions with" +
- " bigger files (which is scheduled first).")
+ " bigger files (which is scheduled first). This configuration is
effective only when using" +
+ " file-based sources such as Parquet, JSON and ORC.")
.longConf
.createWithDefault(4 * 1024 * 1024)
val IGNORE_CORRUPT_FILES = buildConf("spark.sql.files.ignoreCorruptFiles")
.doc("Whether to ignore corrupt files. If true, the Spark jobs will
continue to run when " +
- "encountering corrupted files and the contents that have been read will
still be returned.")
+ "encountering corrupted files and the contents that have been read will
still be returned. " +
+ "This configuration is effective only when using file-based sources such
as Parquet, JSON " +
+ "and ORC.")
.booleanConf
.createWithDefault(false)
val IGNORE_MISSING_FILES = buildConf("spark.sql.files.ignoreMissingFiles")
.doc("Whether to ignore missing files. If true, the Spark jobs will
continue to run when " +
- "encountering missing files and the contents that have been read will
still be returned.")
+ "encountering missing files and the contents that have been read will
still be returned. " +
+ "This configuration is effective only when using file-based sources such
as Parquet, JSON " +
+ "and ORC.")
.booleanConf
.createWithDefault(false)
val MAX_RECORDS_PER_FILE = buildConf("spark.sql.files.maxRecordsPerFile")
Review comment:
I think this applies when we write Hive table as well. Can you double check?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]