cloud-fan commented on a change in pull request #35139:
URL: https://github.com/apache/spark/pull/35139#discussion_r782176547



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
##########
@@ -110,23 +110,28 @@ object ExpressionEncoder {
     }
     val newSerializer = CreateStruct(serializers)
 
+    def nullSafe(input: Expression, result: Expression): Expression = {
+      If(IsNull(input), Literal.create(null, result.dataType), result)
+    }
+
     val newDeserializerInput = GetColumnByOrdinal(0, newSerializer.dataType)
     val deserializers = encoders.zipWithIndex.map { case (enc, index) =>
       val getColExprs = enc.objDeserializer.collect { case c: 
GetColumnByOrdinal => c }.distinct
       assert(getColExprs.size == 1, "object deserializer should have only one 
" +
         s"`GetColumnByOrdinal`, but there are ${getColExprs.size}")
 
       val input = GetStructField(newDeserializerInput, index)
-      enc.objDeserializer.transformUp {
+      val newDeserializer = enc.objDeserializer.transformUp {

Review comment:
       > having a dedicated tuple ExpressionEncoder for Dataset.joinWith
   
   I like this idea. Ideally, for an `ExpressionEncoder[T]`, the deserialize 
expression assumes the input row is not null, as the input rows are from a 
`DataFrame` and Spark doesn't allow null as top-level rows. We can see from 
`ScalaReflection.deserializerForType` that we use `AssertNotNull` to guarantee 
it. This null check seems missing in `RowEncoder.deserializerFor`
   
   In `ExpressionEncoder.tuple`, we inherit this assumption and assume each 
element of the Tuple can't be null. This assumption is broken in `joinWith`, 
because the Tuple element can be null for outer joins.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to