Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173323846
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with
ExpressionEvalHelper {
.analyze
comparePlans(Optimizer execute rel, expected)
}
+
+ test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
+ val structRel = relation
+ .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)),
0, None) as "foo")
+ .groupBy($"foo")("1").analyze
+ val structExpected = relation
+ .select('nullable_id as "foo")
+ .groupBy($"foo")("1").analyze
+ comparePlans(Optimizer execute structRel, structExpected)
+
+ // If nullable attributes aren't used in the 'expected' plans, the
array and map test
+ // cases fail because array and map indexing can return null so the
output attribute
--- End diff --
Done, thanks. I filed SPARK-23634 to fix this. Out of interest, why does
`AttributeReference` cache the nullability of its referent? Is it because
comparison is too expensive to do if you have to follow a level of indirection
to get to the original attribute?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]