dtenedor commented on code in PR #44939:
URL: https://github.com/apache/spark/pull/44939#discussion_r1475177031


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala:
##########
@@ -105,8 +105,6 @@ class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
       sparkSession.sessionState.conf.csvColumnPruning,
       sparkSession.sessionState.conf.sessionLocalTimeZone,
       sparkSession.sessionState.conf.columnNameOfCorruptRecord)
-    val isColumnPruningEnabled = parsedOptions.isColumnPruningEnabled

Review Comment:
   👍 I restored this to compute a separate `val` here, like before.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala:
##########
@@ -278,13 +280,28 @@ class CSVOptions(
     .getOrElse(UNESCAPED_QUOTE_HANDLING, 
"STOP_AT_DELIMITER").toUpperCase(Locale.ROOT))
 
   /**
+   * Returns true if column pruning is enabled and there are no existence 
column default values in
+   * the [[schema]].
+   *
    * The column pruning feature can be enabled either via the CSV option 
`columnPruning` or
    * in non-multiline mode via initialization of CSV options by the SQL config:
    * `spark.sql.csv.parser.columnPruning.enabled`.
    * The feature is disabled in the `multiLine` mode because of the issue:
    * https://github.com/uniVocity/univocity-parsers/issues/529
+   *
+   * We disable column pruning when there are any column defaults, instead 
preferring to reach in
+   * each row and then post-process it to substitute the default values after.
    */
-  val isColumnPruningEnabled: Boolean = getBool(COLUMN_PRUNING, !multiLine && 
columnPruning)
+  def isColumnPruningEnabled(schema: StructType): Boolean = {
+    var result = !multiLine && columnPruning
+    if (parameters != null) {

Review Comment:
   Good point, I converted this parameter lookup back to a `val` stored in the 
class and serialized to executors, so we don't try to lookup the CSV options in 
executors anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to