[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates
[ https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891584#comment-16891584 ] Takeshi Yamamuro commented on SPARK-24079: -- Yea, I think SPARK-24080 is the same with SPARK-27915, so how about closing SPARK-24080 as duplicate, then moving discussions to the newer one? > Update the nullability of Join output based on inferred predicates > -- > > Key: SPARK-24079 > URL: https://issues.apache.org/jira/browse/SPARK-24079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > In the master, a logical `Join` node does not respect the nullability that > the optimizer rule `InferFiltersFromConstraints` > might change when inferred predicates have `IsNotNull`, e.g., > {code} > scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0") > scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1") > scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner") > scala> joinedDf.explain > == Physical Plan == > *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight > :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84] > : +- *(2) Filter isnotnull(_1#80) > : +- LocalTableScan [_1#80, _2#81] > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > true] as bigint))) >+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93] > +- *(1) Filter isnotnull(_1#89) > +- LocalTableScan [_1#89, _2#90] > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(true, true, true, true) > {code} > But, these `nullable` values should be: > {code} > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(false, true, false, true) > {code} > This ticket comes from the previous discussion: > https://github.com/apache/spark/pull/18576#pullrequestreview-107585997 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates
[ https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890672#comment-16890672 ] Josh Rosen commented on SPARK-24079: Update: I just realized that SPARK-27915 is actually a closer duplicate of SPARK-24080, which is very closely related to this ticket. > Update the nullability of Join output based on inferred predicates > -- > > Key: SPARK-24079 > URL: https://issues.apache.org/jira/browse/SPARK-24079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > In the master, a logical `Join` node does not respect the nullability that > the optimizer rule `InferFiltersFromConstraints` > might change when inferred predicates have `IsNotNull`, e.g., > {code} > scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0") > scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1") > scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner") > scala> joinedDf.explain > == Physical Plan == > *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight > :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84] > : +- *(2) Filter isnotnull(_1#80) > : +- LocalTableScan [_1#80, _2#81] > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > true] as bigint))) >+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93] > +- *(1) Filter isnotnull(_1#89) > +- LocalTableScan [_1#89, _2#90] > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(true, true, true, true) > {code} > But, these `nullable` values should be: > {code} > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(false, true, false, true) > {code} > This ticket comes from the previous discussion: > https://github.com/apache/spark/pull/18576#pullrequestreview-107585997 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates
[ https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884815#comment-16884815 ] Josh Rosen commented on SPARK-24079: I'm marking SPARK-27915 (a newer ticket filed by me) as a partial duplicate of this ticket. Both issues are concerned with updating nullability based on inferred constraints. There's some detailed analysis of the `IsNotNull` handling in my PR (which I think is valuable in its own right, since it helps to clarify some existing optimizations in codegen). I'm currently blocked on some attribute reference issues, but maybe the Fix FixNullability rule is what I needed). > Update the nullability of Join output based on inferred predicates > -- > > Key: SPARK-24079 > URL: https://issues.apache.org/jira/browse/SPARK-24079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > In the master, a logical `Join` node does not respect the nullability that > the optimizer rule `InferFiltersFromConstraints` > might change when inferred predicates have `IsNotNull`, e.g., > {code} > scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0") > scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1") > scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner") > scala> joinedDf.explain > == Physical Plan == > *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight > :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84] > : +- *(2) Filter isnotnull(_1#80) > : +- LocalTableScan [_1#80, _2#81] > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > true] as bigint))) >+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93] > +- *(1) Filter isnotnull(_1#89) > +- LocalTableScan [_1#89, _2#90] > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(true, true, true, true) > {code} > But, these `nullable` values should be: > {code} > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(false, true, false, true) > {code} > This ticket comes from the previous discussion: > https://github.com/apache/spark/pull/18576#pullrequestreview-107585997 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates
[ https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451650#comment-16451650 ] Apache Spark commented on SPARK-24079: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/21148 > Update the nullability of Join output based on inferred predicates > -- > > Key: SPARK-24079 > URL: https://issues.apache.org/jira/browse/SPARK-24079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > In the master, a logical `Join` node does not respect the nullability that > the optimizer rule `InferFiltersFromConstraints` > might change when inferred predicates have `IsNotNull`, e.g., > {code} > scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0") > scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1") > scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner") > scala> joinedDf.explain > == Physical Plan == > *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight > :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84] > : +- *(2) Filter isnotnull(_1#80) > : +- LocalTableScan [_1#80, _2#81] > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > true] as bigint))) >+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93] > +- *(1) Filter isnotnull(_1#89) > +- LocalTableScan [_1#89, _2#90] > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(true, true, true, true) > {code} > But, these `nullable` values should be: > {code} > scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable) > res15: Seq[Boolean] = List(false, true, false, true) > {code} > This ticket comes from the previous discussion: > https://github.com/apache/spark/pull/18576#pullrequestreview-107585997 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org