Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22367#discussion_r216196505
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -91,9 +91,10 @@ abstract class CSVDataSource extends Serializable {
}
row.zipWithIndex.map { case (value, index) =>
- if (value == null || value.isEmpty || value == options.nullValue) {
- // When there are empty strings or the values set in
`nullValue`, put the
- // index as the suffix.
+ if (value == null || value.isEmpty || value == options.nullValue ||
+ value == options.emptyValueInRead) {
--- End diff --
@MaxGekk, can we get rid of this one (and the one in
`CSVInferSchema.scala`) for now since we target 2.4? IIRC (need to double
check), this behaviour by `makeSafeHeader` is from R's `read.csv`. We should
check if it is coherent or not.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]