izchen opened a new pull request #29862: URL: https://github.com/apache/spark/pull/29862
### What changes were proposed in this pull request? In [SPARK-16896](https://issues.apache.org/jira/browse/SPARK-16896), generate some new column headers to replace duplicate column headers or empty string column headers in the CSV DataSource. In this PR, when the newly generated column header is duplicated with the existing column header, a new column header is generated again using the method in SPARK-16896. ### Why are the changes needed? When the CSV data source has duplicate column headers, Spark will generate some new column headers based on the original column headers with the index as a suffix. When the newly generated column header is duplicated with the existing column header, Spark will throw an exception message that is difficult for users to understand. For example, the CSV column header is `a,a,a,a1`. > AnalysisException: Found duplicate column(s) in the data schema: a1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a unit test case ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
