[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates

2019-07-23 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891584#comment-16891584
 ] 

Takeshi Yamamuro commented on SPARK-24079:
--

Yea, I think SPARK-24080 is the same with SPARK-27915, so how about closing 
SPARK-24080 as duplicate, then moving discussions to the newer one?

> Update the nullability of Join output based on inferred predicates
> --
>
> Key: SPARK-24079
> URL: https://issues.apache.org/jira/browse/SPARK-24079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In the master, a logical `Join` node does not respect the nullability that 
> the optimizer rule `InferFiltersFromConstraints`
> might change when inferred predicates have `IsNotNull`, e.g.,
> {code}
> scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0")
> scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1")
> scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner")
> scala> joinedDf.explain
> == Physical Plan ==
> *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight
> :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84]
> :  +- *(2) Filter isnotnull(_1#80)
> : +- LocalTableScan [_1#80, _2#81]
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> true] as bigint)))
>+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93]
>   +- *(1) Filter isnotnull(_1#89)
>  +- LocalTableScan [_1#89, _2#90]
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(true, true, true, true)
> {code}
> But, these `nullable` values should be:
> {code}
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(false, true, false, true)
> {code}
> This ticket comes from the previous discussion: 
> https://github.com/apache/spark/pull/18576#pullrequestreview-107585997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates

2019-07-22 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890672#comment-16890672
 ] 

Josh Rosen commented on SPARK-24079:


Update: I just realized that SPARK-27915 is actually a closer duplicate of 
SPARK-24080, which is very closely related to this ticket. 

> Update the nullability of Join output based on inferred predicates
> --
>
> Key: SPARK-24079
> URL: https://issues.apache.org/jira/browse/SPARK-24079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In the master, a logical `Join` node does not respect the nullability that 
> the optimizer rule `InferFiltersFromConstraints`
> might change when inferred predicates have `IsNotNull`, e.g.,
> {code}
> scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0")
> scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1")
> scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner")
> scala> joinedDf.explain
> == Physical Plan ==
> *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight
> :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84]
> :  +- *(2) Filter isnotnull(_1#80)
> : +- LocalTableScan [_1#80, _2#81]
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> true] as bigint)))
>+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93]
>   +- *(1) Filter isnotnull(_1#89)
>  +- LocalTableScan [_1#89, _2#90]
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(true, true, true, true)
> {code}
> But, these `nullable` values should be:
> {code}
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(false, true, false, true)
> {code}
> This ticket comes from the previous discussion: 
> https://github.com/apache/spark/pull/18576#pullrequestreview-107585997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates

2019-07-14 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884815#comment-16884815
 ] 

Josh Rosen commented on SPARK-24079:


I'm marking SPARK-27915 (a newer ticket filed by me) as a partial duplicate of 
this ticket. Both issues are concerned with updating nullability based on 
inferred constraints.

There's some detailed analysis of the `IsNotNull` handling in my PR (which I 
think is valuable in its own right, since it helps to clarify some existing 
optimizations in codegen). I'm currently blocked on some attribute reference 
issues, but maybe the Fix FixNullability rule is what I needed).

> Update the nullability of Join output based on inferred predicates
> --
>
> Key: SPARK-24079
> URL: https://issues.apache.org/jira/browse/SPARK-24079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In the master, a logical `Join` node does not respect the nullability that 
> the optimizer rule `InferFiltersFromConstraints`
> might change when inferred predicates have `IsNotNull`, e.g.,
> {code}
> scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0")
> scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1")
> scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner")
> scala> joinedDf.explain
> == Physical Plan ==
> *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight
> :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84]
> :  +- *(2) Filter isnotnull(_1#80)
> : +- LocalTableScan [_1#80, _2#81]
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> true] as bigint)))
>+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93]
>   +- *(1) Filter isnotnull(_1#89)
>  +- LocalTableScan [_1#89, _2#90]
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(true, true, true, true)
> {code}
> But, these `nullable` values should be:
> {code}
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(false, true, false, true)
> {code}
> This ticket comes from the previous discussion: 
> https://github.com/apache/spark/pull/18576#pullrequestreview-107585997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates

2018-04-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451650#comment-16451650
 ] 

Apache Spark commented on SPARK-24079:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/21148

> Update the nullability of Join output based on inferred predicates
> --
>
> Key: SPARK-24079
> URL: https://issues.apache.org/jira/browse/SPARK-24079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In the master, a logical `Join` node does not respect the nullability that 
> the optimizer rule `InferFiltersFromConstraints`
> might change when inferred predicates have `IsNotNull`, e.g.,
> {code}
> scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0")
> scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1")
> scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner")
> scala> joinedDf.explain
> == Physical Plan ==
> *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight
> :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84]
> :  +- *(2) Filter isnotnull(_1#80)
> : +- LocalTableScan [_1#80, _2#81]
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> true] as bigint)))
>+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93]
>   +- *(1) Filter isnotnull(_1#89)
>  +- LocalTableScan [_1#89, _2#90]
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(true, true, true, true)
> {code}
> But, these `nullable` values should be:
> {code}
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(false, true, false, true)
> {code}
> This ticket comes from the previous discussion: 
> https://github.com/apache/spark/pull/18576#pullrequestreview-107585997



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org