srowen commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT URL: https://github.com/apache/spark/pull/26831#issuecomment-564568748 Hm, I'm not sure about this. Converting bytes to Text would always use UTF-8, yes. Converting bytes to String would need an encoding, which I assume somewhere defaults to a platform encoding, right? or else how is that defined? But back to my question, how do you get non-UTF-8 data here in the first place? This might be a dumb question, I don't know how other writers might work. Is the case a table where the serde properties define the encoding to something besides UTF-8? Because here you are just writing bytes directly, which is why I ask how this arises in practice. But in that case does this not get handled properly somewhere else by Hive or Spark? I guess I'm saying I don't know if this demonstrates a real-world issue, vs what happens if you hack in some bytes.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
