Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13300
  
    Actually, this feature might not be urgent as said above but IMO I like 
this feature to be honest. I guess the reason it was hold is that IMHO it does 
not look a clean fix. 
    
    I recently refactored this code path and I have one left PR, 
https://github.com/apache/spark/pull/16680. After hopefully merging, there can 
be a easy clean fix consistently with json one within 10ish line additions, for 
example, something like one below in `Dataset`...
    
    ```scala
    def csv(csv: Dataset[String]): DataFrame = {
      val parsedOptions: CSVOptions = new CSVOptions(extraOptions.toMap)
      val caseSensitive = sparkSession.sessionState.conf.caseSensitive
      val schema = userSpecifiedSchema.getOrElse {
        InferSchema.infer(csv, caseSensitive, parsedOptions)
      }
    
      val parsed = csv.mapPartitions { iter =>
        val parser = new UnivocityParser(schema, caseSensitive, parsedOptions)
        iter.flatMap(parser.parse)
      }
    
      Dataset.ofRows(
        sparkSession,
        LogicalRDD(schema.toAttributes, parsed)(sparkSession))
    }
    ```
    
    I remember there have been a quite bit of questions about this feature in 
spark-csv as thirdparty (and also spark-xml too).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to