izchen commented on a change in pull request #29862:
URL: https://github.com/apache/spark/pull/29862#discussion_r494817447



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala
##########
@@ -93,6 +93,12 @@ object CSVUtils {
           value
         }
       }
+      if (header.sameElements(row)) {
+        header
+      } else {
+        // Ensure that the newly generated and existing headers are not 
duplicated.
+        makeSafeHeader(header, caseSensitive, options)
+      }

Review comment:
       Current behavior of Spark and R for CSV headers:
   
   | CSV             | SPARK             | R               |
   | --------------- | ----------------- | --------------- |
   | `a,a,a,a`       | `a0,a1,a2,a3`     | `a,a.1,a.2,a.3` |
   | `a,,,`          | `a,_c1,_c2,_c3`   | `a,X,X.1,X.2`   |
   | *header: false* | `_c0,_c1,_c2,_c3` | `V1,V2,V3,V4`   |
   
   If we follow R's behavior, we will introduce a user-facing change. This may 
cause errors in the user's legacy code. 
   Maybe we should keep the behavior of Spark.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to