GitHub user mn-mikke opened a pull request: https://github.com/apache/spark/pull/21687
[SPARK-24165][SQL] Fixing the output data type of CaseWhen expression ## What changes were proposed in this pull request? This PR is proposing a fix for the output data type of ```CaseWhen``` expression. The current implementation ignores nullability of nested types from different execution branches and returns type of the first branch. This could lead to unwanted ```NullPointerException``` from other expressions depending on a CaseWhen expression. Example: ``` val rows = new util.ArrayList[Row]() rows.add(Row(true, ("a", 1))) rows.add(Row(false, (null, 2))) val schema = StructType(Seq( StructField("cond", BooleanType, false), StructField("s", StructType(Seq( StructField("val1", StringType, true), StructField("val2", IntegerType, false) )), false) )) val df = spark.createDataFrame(rows, schema) df .select(when('cond, struct(lit("x").as("val1"), lit(10).as("val2"))).otherwise('s) as "res") .select('res.getField("val1")) .show() ``` Exception: ``` Exception in thread "main" java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.LocalTableScanExec$$anonfun$unsafeRows$1.apply(LocalTableScanExec.scala:44) at org.apache.spark.sql.execution.LocalTableScanExec$$anonfun$unsafeRows$1.apply(LocalTableScanExec.scala:44) ... ``` Output schema: ``` root |-- res.val1: string (nullable = false) ``` ## How was this patch tested? New test cases added into - DataFrameSuite.scala - conditionalExpressions.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/mn-mikke/spark SPARK-24165 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21687.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21687 ---- commit 71040635723a4dc3bc55b4415261d5a7abf4ed50 Author: Marek Novotny <mn.mikke@...> Date: 2018-07-01T13:36:24Z [SPARK-24165][SQL] Fixing the output data type of CaseWhen expression when resolving nullability of nested types ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org