Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173007679
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with
ExpressionEvalHelper {
.analyze
comparePlans(Optimizer execute rel, expected)
}
+
+ test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
+ val structRel = relation
+ .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)),
0, None) as "foo")
+ .groupBy($"foo")("1").analyze
+ val structExpected = relation
+ .select('nullable_id as "foo")
+ .groupBy($"foo")("1").analyze
+ comparePlans(Optimizer execute structRel, structExpected)
+
+ // If nullable attributes aren't used in the 'expected' plans, the
array and map test
+ // cases fail because array and map indexing can return null so the
output attribute
--- End diff --
It's a good question! I'm not too familiar with how nullability is marked
and unmarked during planning. My understanding is roughly that the analyzer
resolves all the plan's expressions and in doing so marks attributes as
nullable or not. After that it's not clear that the optimizer revisits any of
those nullability decisions. Is there an optimizer pass which should make
nullability marking more precise?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]