Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20894#discussion_r189103522
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
           StructType(schema.filterNot(_.name == 
parsedOptions.columnNameOfCorruptRecord))
     
         val linesWithoutHeader: RDD[String] = maybeFirstLine.map { firstLine =>
    +      if (!parsedOptions.enforceSchema) {
    +        CSVDataSource.checkHeader(firstLine, new 
CsvParser(parsedOptions.asParserSettings),
    --- End diff --
    
    Thank you for explaining of the use case. I didn't realize that the input 
dataset can contain multiple headers. So, potentially the headers can contain 
column names in different order or even incorrect column names. It seems pretty 
tough to fix that case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to