[GitHub] [spark] srowen commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT

GitBox Wed, 11 Dec 2019 06:30:01 -0800

srowen commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter 
standardOI should use ObjectInspectorCopyOption.DEFAULT
URL: https://github.com/apache/spark/pull/26831#issuecomment-564568748
 
 
   Hm, I'm not sure about this. Converting bytes to Text would always use 
UTF-8, yes. Converting bytes to String would need an encoding, which I assume 
somewhere defaults to a platform encoding, right? or else how is that defined?
   
   But back to my question, how do you get non-UTF-8 data here in the first 
place? This might be a dumb question, I don't know how other writers might 
work. Is the case a table where the serde properties define the encoding to 
something besides UTF-8? Because here you are just writing bytes directly, 
which is why I ask how this arises in practice. But in that case does this not 
get handled properly somewhere else by Hive or Spark?
   
   I guess I'm saying I don't know if this demonstrates a real-world issue, vs 
what happens if you hack in some bytes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on issue #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT

Reply via email to