MaxGekk commented on code in PR #56130:
URL: https://github.com/apache/spark/pull/56130#discussion_r3387778334
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -662,8 +666,9 @@ private[sql] object UnivocityParser {
val filteredLines: Iterator[String] =
CSVExprUtils.filterCommentAndEmpty(lines, options)
+ val lineSep = options.lineSeparator.getOrElse("\n")
Review Comment:
To address @shrirangmhalgi's question: the appended separator is correct —
this uses `options.lineSeparator.getOrElse("\n")`, not a hardcoded `"\n"`, so
Univocity always receives the separator it was configured with (e.g. `|` or
`\r\n`) and no mismatch with `format.setLineSeparator` occurs.
The one gap is test coverage for the `parseLine` error path under a custom
`lineSep`. The `case Some(sep)` branch in `parseLine` (which strips the
appended separator from `badRecord`) has no end-to-end test: the existing
SPARK-49444 integration test in `CSVSuite` covers only the default-separator
case, which goes through `case _ => line.stripLineEnd`. Extending that test
with a custom `lineSep` that also triggers a malformed-record error — e.g.
`.option("lineSep", "|")` on a `|`-separated file that exceeds `maxColumns` —
would cover both the appended-separator invariant and the stripping in a single
shot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]