[GitHub] spark pull request #21687: [SPARK-24165][SQL] Fixing the output data type of...

mn-mikke Mon, 02 Jul 2018 01:53:04 -0700

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21687#discussion_r199425774
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
    @@ -129,7 +129,7 @@ case class CaseWhen(
         case Seq(dt1, dt2) => dt1.sameType(dt2)
       }
     
    -  override def dataType: DataType = branches.head._2.dataType
    +  override def dataType: DataType = 
valueTypes.reduce(TypeCoercion.findTightestCommonType(_, _).get)
    --- End diff --
    
    Thanks for your suggestion, but is it the best solution?
    1. It seems that this solution would also fail on the exemple in the 
description. The root data types in both branches are non-nullable, so 
```branches.head._2.dataType``` gets called again.
    ```
     |-- named_struct(val1, x AS `val1`, val2, 10 AS `val2`): struct (nullable 
= false)
     |    |-- val1: string (nullable = false)
     |    |-- val2: integer (nullable = false)
    ```
    and
    ```
     |-- s: struct (nullable = false)
     |    |-- val1: string (nullable = true)
     |    |-- val2: integer (nullable = false)
    ```
    2. Let's assume that there is no if statement and we call just 
```.asNullable```. This will make all the struct fields nullable and changes 
```containsNull``` to ```true``` for all ```MapTypes``` and ```ArrayTypes``` 
within the data type structure. Is it what we want? 
    
    Personally, I think we need something that goes recursively through 
non-primitive types and merges ```nullable``` and ```containsNull``` flags. 
That's exactly what ```TypeCoercion.findTightestCommonType``` does when passing 
```sameType``` equal data types.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21687: [SPARK-24165][SQL] Fixing the output data type of...

Reply via email to