Re: [PR] [SPARK-51579][SQL] Avoid EOFException in CSV Parsing by Appending Line Terminator [spark]

via GitHub Wed, 10 Jun 2026 04:15:53 -0700


MaxGekk commented on code in PR #56130:
URL: https://github.com/apache/spark/pull/56130#discussion_r3387778334



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##########
@@ -662,8 +666,9 @@ private[sql] object UnivocityParser {
 
     val filteredLines: Iterator[String] = 
CSVExprUtils.filterCommentAndEmpty(lines, options)
 
+    val lineSep = options.lineSeparator.getOrElse("\n")

Review Comment:
   To address @shrirangmhalgi's question: the appended separator is correct — 
this uses `options.lineSeparator.getOrElse("\n")`, not a hardcoded `"\n"`, so 
Univocity always receives the separator it was configured with (e.g. `|` or 
`\r\n`) and no mismatch with `format.setLineSeparator` occurs.
   
   The one gap is test coverage for the `parseLine` error path under a custom 
`lineSep`. The `case Some(sep)` branch in `parseLine` (which strips the 
appended separator from `badRecord`) has no end-to-end test: the existing 
SPARK-49444 integration test in `CSVSuite` covers only the default-separator 
case, which goes through `case _ => line.stripLineEnd`. Extending that test 
with a custom `lineSep` that also triggers a malformed-record error — e.g. 
`.option("lineSep", "|")` on a `|`-separated file that exceeds `maxColumns` — 
would cover both the appended-separator invariant and the stripping in a single 
shot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-51579][SQL] Avoid EOFException in CSV Parsing by Appending Line Terminator [spark]

Reply via email to