maropu commented on a change in pull request #23390:
[SPARK-21351][SQL][followup] reuse the FixNullability rule
URL: https://github.com/apache/spark/pull/23390#discussion_r244256851
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -2846,3 +2812,43 @@ object UpdateOuterReferences extends Rule[LogicalPlan] {
}
}
}
+
+/**
+ * Updates nullability of Attributes in a resolved LogicalPlan by using the
nullability of
+ * corresponding Attributes of its children output Attributes. This step is
needed because
+ * users can use a resolved AttributeReference in the Dataset API and outer
joins
+ * can change the nullability of an AttribtueReference. Without this rule, a
nullable column's
+ * nullable field can be actually set as non-nullable, which cause illegal
optimization
+ * (e.g., NULL propagation) and wrong answers.
+ * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
+ *
+ * This rule should be executed again at the end of optimization phase, as
optimizer may change
+ * some expressions and their nullabilities as well. See SPARK-21351 for more
details.
+ */
+object UpdateNullability extends Rule[LogicalPlan] {
Review comment:
How about `UpdateNullability` -> `UpdateAttributeNullability`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]