ulysses-you commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT URL: https://github.com/apache/spark/pull/26831#issuecomment-564565979 The problem is writing. When a column type is string, spark will read bytes to UTF8String. This step not actually check the UTF-8 code, just copy bytes. Then convert the UTF8String.toString during write. This step will convert every bytes as UTF-8 string. As the result, non UTF-8 code bytes will error. So we should pass bytes directly without tostring in right sence.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
