Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r219495737 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -212,6 +212,7 @@ class CSVOptions( settings.setEmptyValue(emptyValueInRead) settings.setMaxCharsPerColumn(maxCharsPerColumn) settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER) + settings.setLineSeparatorDetectionEnabled(true) --- End diff -- The auto-detection mechanism is enabled for both - multi-line and per-line mode. I guess it has some overhead on detection of new lines which is not needed in per-line mode. I would benchmark it in both modes (see `CSVBenchmarks`), and if the overhead in per-line mode is significant, I would not enable the option when `multiLine` is set to `false`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org