[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450426#comment-16450426
 ] 

Apache Spark commented on SPARK-24043:
--

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/21144

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Priority: Minor
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-24 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449428#comment-16449428
 ] 

Takeshi Yamamuro commented on SPARK-24043:
--

Aha, I gotcha. ya, we currently have no configuration for the expression tree.

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Priority: Minor
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-24 Thread Bruce Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449352#comment-16449352
 ] 

Bruce Robbins commented on SPARK-24043:
---

You're half-way there. When whole-stage codegen is off (and only then), 
FilterExec requests code generation and compilation for the predicate. See 
FilterExec.doExecute (which calls SparkPlan.newPredicate, which attempts to 
generate code for the predicate).

I couldn't find a configuration setting to turn off predicate codegen.

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Priority: Minor
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-23 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449326#comment-16449326
 ] 

Takeshi Yamamuro commented on SPARK-24043:
--

I tried this with codegen=off;
{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("SET spark.sql.codegen.wholeStage=false")
res0: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> Seq((1)).toDF("a").filter('a > rand(7)).show 
+---+
|  a|
+---+
|  1|
+---+
{code}

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Priority: Minor
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-23 Thread Bruce Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449281#comment-16449281
 ] 

Bruce Robbins commented on SPARK-24043:
---

[~maropu]

> Do I miss any precondition?

For this bug to materialize in spark-shell, Spark SQL needs to be interpreted 
mode (whole-stage codegen and predicate codegen are shut off).

I ran some of the DataFrame and Dataset test suites in interpreted mode and 
this bug popped out (during the run for the test "handle nondeterministic 
expressions correctly for set operations"). To put Spark SQL in interpreted 
mode, I manually shut off whole-stage codegen and predicate codegen. It was 
still off when I did the above spark-shell demo.

Outside of manually tweaking Spark, it's difficult to get predicate codegen to 
fail (It's easy to get whole-stage codegen to fall back – just supply more than 
300 columns in your query. Predicate codegen is more resilient). That's why 
this is a low impact bug. However, at some point we might want to test 
interpreted mode.

I will make a PR, but it's no emergency.

To see the bug in action with Spark as-is, add these test cases to 
PredicateSuite. The first should succeed (no Nondeterministic expressions). The 
second will fail with an exception ("Nondeterministic expression 
org.apache.spark.sql.catalyst.expressions.Rand should be initialized before 
eval"):
{code:java}
  test("Interpreted Predicate should work without nondeterministic 
expressions") {
val interpreted = InterpretedPredicate.create(LessThan(Literal(0.2), 
Literal(1.0)))
interpreted.initialize(0)
assert(interpreted.eval(new UnsafeRow()))
  }

  test("Interpreted Predicate should initialize nondeterministic expressions") {
val interpreted = InterpretedPredicate.create(LessThan(Rand(7), 
Literal(1.0)))
interpreted.initialize(0)
assert(interpreted.eval(new UnsafeRow()))
  }
{code}

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Priority: Minor
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-23 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447614#comment-16447614
 ] 

Takeshi Yamamuro commented on SPARK-24043:
--

I tried this in the master and v2.3 though, the issue didn't happen there. Do I 
miss any precondition?

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Priority: Minor
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org