MaxGekk commented on code in PR #44872:
URL: https://github.com/apache/spark/pull/44872#discussion_r1467408202
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVPartitionReaderFactory.scala:
##########
@@ -58,7 +58,7 @@ case class CSVPartitionReaderFactory(
actualReadDataSchema,
options,
filters)
- val schema = if (options.columnPruning) actualReadDataSchema else
actualDataSchema
+ val schema = if (options.isColumnPruningEnabled) actualReadDataSchema else
actualDataSchema
Review Comment:
The `schema` is used only `CSVHeaderChecker` which is supposed to check
column names in CSV and provided schema fields. It shouldn't depend on the
column pruning feature at all, from my point of view.
```scala
private def checkHeaderColumnNames(columnNames: Array[String]): Unit = {
...
if (headerLen == schemaSize) {
...
} else {
errorMessage = Some(
s"""|Number of column in CSV header is not equal to number of
fields in the schema:
| Header length: $headerLen, schema size: $schemaSize
|$source""".stripMargin)
}
```
`schemaSize` must be f**ull data schema** of CSV filed, but not the required
schema.
Let me re-think it, and avoid the dependency from the column pruning in
`CSVHeaderChecker`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]