Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180016246 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -361,6 +361,15 @@ class JacksonParser( // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. throw BadRecordException(() => recordLiteral(record), () => None, e) + case e: CharConversionException if options.encoding.isEmpty => + val msg = + """Failed to parse a character. Encoding was detected automatically. --- End diff -- > I don't think `Encoding was detected automatically` is not quite correct. It is absolutely correct. If `encoding` is not set, it is detected automatically by jackson. Look at the condition `if options.encoding.isEmpty =>`. > It might not help user solve the issue but it gives less correct information. It gives absolutely correct information. > They could thought it detects encoding correctly regardless of multiline option. The message DOESN'T say that `encoding` detected correctly. > Think about this scenario: users somehow get this exception and read Failed to parse a character. Encoding was detected automatically.. What would they think? They will look at the proposed solution `You might want to set it explicitly via the encoding option like` and will set `encoding` > I would think somehow the file is somehow failed to read It could be true even `encoding` is set correctly > but it looks detecting the encoding in the file correctly automatically I don't know why you decided that. I see nothing about `encoding` correctness in the message. > It's annoying to debug encoding related stuff in my experience. It would be nicer if we give the correct information as much as we can. What is your suggestion for the error message? > I am saying let's document the automatic encoding detection feature only for multiLine officially, which is true. I agree let's document that thought it is not related to this PR. This PR doesn't change behavior of encoding auto detection. And it must not change the behavior from my point of view. If you want to restrict the encoding auto-detection mechanism somehow, please, create separate PR. We will discuss separately what kind of customer's apps it will break.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org