Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/13300
Actually, this feature might not be urgent as said above but IMO I like
this feature to be honest. I guess the reason it was hold is that IMHO it does
not look a clean fix.
I recently refactored this code path and I have one left PR,
https://github.com/apache/spark/pull/16680. After hopefully merging, there can
be a easy clean fix consistently with json one within 10ish line additions, for
example, something like one below in `Dataset`...
```scala
def csv(csv: Dataset[String]): DataFrame = {
val parsedOptions: CSVOptions = new CSVOptions(extraOptions.toMap)
val caseSensitive = sparkSession.sessionState.conf.caseSensitive
val schema = userSpecifiedSchema.getOrElse {
InferSchema.infer(csv, caseSensitive, parsedOptions)
}
val parsed = csv.mapPartitions { iter =>
val parser = new UnivocityParser(schema, caseSensitive, parsedOptions)
iter.flatMap(parser.parse)
}
Dataset.ofRows(
sparkSession,
LogicalRDD(schema.toAttributes, parsed)(sparkSession))
}
```
I remember there have been a quite bit of questions about this feature in
spark-csv as thirdparty (and also spark-xml too).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]