Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20937#discussion_r180009312
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
    @@ -361,6 +361,15 @@ class JacksonParser(
             // For such records, all fields other than the field configured by
             // `columnNameOfCorruptRecord` are set to `null`.
             throw BadRecordException(() => recordLiteral(record), () => None, 
e)
    +      case e: CharConversionException if options.encoding.isEmpty =>
    +        val msg =
    +          """Failed to parse a character. Encoding was detected 
automatically.
    --- End diff --
    
    I don't think `Encoding was detected automatically` is not quite correct. 
It might not help user solve the issue but it gives less correct information. 
They could thought it detects encoding correctly regardless of `multiline` 
option.
    
    Think about this scenario: users somehow get this exception and read  
`Failed to parse a character. Encoding was detected automatically.`. What would 
they think? I would think somehow the file is somehow failed to read but it 
looks detecting the encoding in the file correctly automatically regardless of 
other options.
    
    It's annoying to debug encoding related stuff in my experience. It would be 
nicer if we give the correct information as much as we can.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to