[jira] [Updated] (SPARK-20273) Disallow Non-deterministic Filter push-down into Join Conditions

2017-04-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20273:

Affects Version/s: 2.0.2

> Disallow Non-deterministic Filter push-down into Join Conditions
> 
>
> Key: SPARK-20273
> URL: https://issues.apache.org/jira/browse/SPARK-20273
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> {noformat}
> sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b 
> having r > 0.5").show()
> {noformat}
> We will get the following error:
> {noformat}
> Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most 
> recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor 
> driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
> {noformat}
> Filters could be pushed down to the join conditions by the optimizer rule 
> {{PushPredicateThroughJoin}}. However, we block users to add 
> non-deterministics conditions by the analyzer (For details, see the PR 
> https://github.com/apache/spark/pull/7535). 
> We should not push down non-deterministic conditions; otherwise, we should 
> allow users to do it by explicitly initialize the non-deterministic 
> expressions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20273) Disallow Non-deterministic Filter push-down into Join Conditions

2017-04-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20273:

Summary: Disallow Non-deterministic Filter push-down into Join Conditions  
(was: No non-deterministic Filter push-down into Join Conditions)

> Disallow Non-deterministic Filter push-down into Join Conditions
> 
>
> Key: SPARK-20273
> URL: https://issues.apache.org/jira/browse/SPARK-20273
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> {noformat}
> sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b 
> having r > 0.5").show()
> {noformat}
> We will get the following error:
> {noformat}
> Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most 
> recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor 
> driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
> {noformat}
> Filters could be pushed down to the join conditions by the optimizer rule 
> {{PushPredicateThroughJoin}}. However, we block users to add 
> non-deterministics conditions by the analyzer (For details, see the PR 
> https://github.com/apache/spark/pull/7535). 
> We should not push down non-deterministic conditions; otherwise, we should 
> allow users to do it by explicitly initialize the non-deterministic 
> expressions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org