Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20937#discussion_r180016246
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
---
@@ -361,6 +361,15 @@ class JacksonParser(
// For such records, all fields other than the field configured by
// `columnNameOfCorruptRecord` are set to `null`.
throw BadRecordException(() => recordLiteral(record), () => None,
e)
+ case e: CharConversionException if options.encoding.isEmpty =>
+ val msg =
+ """Failed to parse a character. Encoding was detected
automatically.
--- End diff --
> I don't think `Encoding was detected automatically` is not quite correct.
It is absolutely correct. If `encoding` is not set, it is detected
automatically by jackson. Look at the condition `if options.encoding.isEmpty
=>`.
> It might not help user solve the issue but it gives less correct
information.
It gives absolutely correct information.
> They could thought it detects encoding correctly regardless of multiline
option.
The message DOESN'T say that `encoding` detected correctly.
> Think about this scenario: users somehow get this exception and read
Failed to parse a character. Encoding was detected automatically.. What would
they think?
They will look at the proposed solution `You might want to set it
explicitly via the encoding option like` and will set `encoding`
> I would think somehow the file is somehow failed to read
It could be true even `encoding` is set correctly
> but it looks detecting the encoding in the file correctly automatically
I don't know why you decided that. I see nothing about `encoding`
correctness in the message.
> It's annoying to debug encoding related stuff in my experience. It would
be nicer if we give the correct information as much as we can.
What is your suggestion for the error message?
> I am saying let's document the automatic encoding detection feature only
for multiLine officially, which is true.
I agree let's document that thought it is not related to this PR. This PR
doesn't change behavior of encoding auto detection. And it must not change the
behavior from my point of view. If you want to restrict the encoding
auto-detection mechanism somehow, please, create separate PR. We will discuss
separately what kind of customer's apps it will break.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]