cloud-fan commented on a change in pull request #35139:
URL: https://github.com/apache/spark/pull/35139#discussion_r782198958
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
##########
@@ -110,23 +110,28 @@ object ExpressionEncoder {
}
val newSerializer = CreateStruct(serializers)
+ def nullSafe(input: Expression, result: Expression): Expression = {
+ If(IsNull(input), Literal.create(null, result.dataType), result)
+ }
+
val newDeserializerInput = GetColumnByOrdinal(0, newSerializer.dataType)
val deserializers = encoders.zipWithIndex.map { case (enc, index) =>
val getColExprs = enc.objDeserializer.collect { case c:
GetColumnByOrdinal => c }.distinct
assert(getColExprs.size == 1, "object deserializer should have only one
" +
s"`GetColumnByOrdinal`, but there are ${getColExprs.size}")
val input = GetStructField(newDeserializerInput, index)
- enc.objDeserializer.transformUp {
+ val newDeserializer = enc.objDeserializer.transformUp {
Review comment:
This bug only occurs for `RowEncoder`, as
`Dataset[T].joinWith(Dataset[U])` works fine:
https://github.com/apache/spark/pull/13425/files#diff-b98c99535d2b28cb47774860d500030e732c244c55b1ac05aead5d1cf1e7a602R772
Can you look into it and figure out the difference? This may help us the
understand the bug better and guide us to the proper fix.
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
##########
@@ -110,23 +110,28 @@ object ExpressionEncoder {
}
val newSerializer = CreateStruct(serializers)
+ def nullSafe(input: Expression, result: Expression): Expression = {
+ If(IsNull(input), Literal.create(null, result.dataType), result)
+ }
+
val newDeserializerInput = GetColumnByOrdinal(0, newSerializer.dataType)
val deserializers = encoders.zipWithIndex.map { case (enc, index) =>
val getColExprs = enc.objDeserializer.collect { case c:
GetColumnByOrdinal => c }.distinct
assert(getColExprs.size == 1, "object deserializer should have only one
" +
s"`GetColumnByOrdinal`, but there are ${getColExprs.size}")
val input = GetStructField(newDeserializerInput, index)
- enc.objDeserializer.transformUp {
+ val newDeserializer = enc.objDeserializer.transformUp {
Review comment:
This bug only occurs for `RowEncoder`, as
`Dataset[T].joinWith(Dataset[U])` works fine:
https://github.com/apache/spark/pull/13425/files#diff-b98c99535d2b28cb47774860d500030e732c244c55b1ac05aead5d1cf1e7a602R772
Can you look into it and figure out the difference? This may help us to
understand the bug better and guide us to the proper fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]