AngersZhuuuu commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-658222144


   > What's the behavior of hive if the script transformation doesn't specify a 
serde? Does Hive pick a default serde, or it well defines the behavior of 
non-serde?
   
   In current code, when we don't write serde with transform, it will use 
LazySimpleSerde
   
https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L717-L723
   
   it means only when you write a wrong serde, and ScriptTransformationExec 
can't find corresponding serde class, it will execute code about 
   
   
https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala#L236-L238
   
   
https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala#L175-L187
   
   As this https://github.com/apache/spark/pull/29085#issuecomment-658057950 
comment, we know that without serde, we can't handle input data string 
correctly, same reason, we can't handle output data too, so add a `wrapper` 
method to convert string to corresponding  data type.
   
   In this https://github.com/apache/spark/pull/29085#issuecomment-658213516 
Jenkins result you can see output data type probelm
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to