AngersZhuuuu commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-658222144
> What's the behavior of hive if the script transformation doesn't specify a serde? Does Hive pick a default serde, or it well defines the behavior of non-serde? In current code, when we don't write serde with transform, it will use LazySimpleSerde https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L717-L723 it means only when you write a wrong serde, and ScriptTransformationExec can't find corresponding serde class, it will execute code about https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala#L236-L238 https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala#L175-L187 As this https://github.com/apache/spark/pull/29085#issuecomment-658057950 comment, we know that without serde, we can't handle input data string correctly, same reason, we can't handle output data too, so add a `wrapper` method to convert string to corresponding data type. In this https://github.com/apache/spark/pull/29085#issuecomment-658213516 Jenkins result you can see output data type probelm ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
