[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

MaxGekk Mon, 09 Apr 2018 01:12:13 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20937#discussion_r180016246
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
    @@ -361,6 +361,15 @@ class JacksonParser(
             // For such records, all fields other than the field configured by
             // `columnNameOfCorruptRecord` are set to `null`.
             throw BadRecordException(() => recordLiteral(record), () => None, 
e)
    +      case e: CharConversionException if options.encoding.isEmpty =>
    +        val msg =
    +          """Failed to parse a character. Encoding was detected 
automatically.
    --- End diff --
    
    > I don't think `Encoding was detected automatically` is not quite correct.
    
    It is absolutely correct. If `encoding` is not set, it is detected 
automatically by jackson.  Look at the condition `if options.encoding.isEmpty 
=>`. 
    
    > It might not help user solve the issue but it gives less correct 
information.
    
    It gives absolutely correct information.
    
    > They could thought it detects encoding correctly regardless of multiline 
option.
    
    The message DOESN'T say that `encoding` detected correctly.
    
    > Think about this scenario: users somehow get this exception and read 
Failed to parse a character. Encoding was detected automatically.. What would 
they think?
    
    They will look at the proposed solution `You might want to set it 
explicitly via the encoding option like` and will set `encoding`
    
    > I would think somehow the file is somehow failed to read
    
    It could be true even `encoding` is set correctly
    
    > but it looks detecting the encoding in the file correctly automatically 
    
    I don't know why you decided that. I see nothing about `encoding` 
correctness in the message.
    
    > It's annoying to debug encoding related stuff in my experience. It would 
be nicer if we give the correct information as much as we can.
    
    What is your suggestion for the error message?
    
    > I am saying let's document the automatic encoding detection feature only 
for multiLine officially, which is true.
    
    I agree let's document that thought it is not related to this PR. This PR 
doesn't change behavior of encoding auto detection. And it must not change the 
behavior from my point of view. If you want to restrict the encoding 
auto-detection mechanism somehow, please, create separate PR. We will discuss 
separately what kind of customer's apps it will break.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

Reply via email to