[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450426#comment-16450426 ] Apache Spark commented on SPARK-24043: -- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/21144 > InterpretedPredicate.eval fails if expression tree contains Nondeterministic > expressions > > > Key: SPARK-24043 > URL: https://issues.apache.org/jira/browse/SPARK-24043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bruce Robbins >Priority: Minor > > When whole-stage codegen and predicate codegen both fail, FilterExec falls > back to using InterpretedPredicate. If the predicate's expression contains > any non-deterministic expressions, the evaluation throws an error: > {noformat} > scala> val df = Seq((1)).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: int] > scala> df.filter('a > 0).show // this works fine > 2018-04-21 20:39:26 WARN FilterExec:66 - Codegen disabled for this > expression: > (value#1 > 0) > +---+ > | a| > +---+ > | 1| > +---+ > scala> df.filter('a > rand(7)).show // this will throw an error > 2018-04-21 20:39:40 WARN FilterExec:66 - Codegen disabled for this > expression: > (cast(value#1 as double) > rand(7)) > 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326) > at > org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34) > {noformat} > This is because no code initializes the Nondeterministic expressions before > eval is called on them. > This is a low impact issue, since it would require both whole-stage codegen > and predicate codegen to fail before FilterExec would fall back to using > InterpretedPredicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449428#comment-16449428 ] Takeshi Yamamuro commented on SPARK-24043: -- Aha, I gotcha. ya, we currently have no configuration for the expression tree. > InterpretedPredicate.eval fails if expression tree contains Nondeterministic > expressions > > > Key: SPARK-24043 > URL: https://issues.apache.org/jira/browse/SPARK-24043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bruce Robbins >Priority: Minor > > When whole-stage codegen and predicate codegen both fail, FilterExec falls > back to using InterpretedPredicate. If the predicate's expression contains > any non-deterministic expressions, the evaluation throws an error: > {noformat} > scala> val df = Seq((1)).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: int] > scala> df.filter('a > 0).show // this works fine > 2018-04-21 20:39:26 WARN FilterExec:66 - Codegen disabled for this > expression: > (value#1 > 0) > +---+ > | a| > +---+ > | 1| > +---+ > scala> df.filter('a > rand(7)).show // this will throw an error > 2018-04-21 20:39:40 WARN FilterExec:66 - Codegen disabled for this > expression: > (cast(value#1 as double) > rand(7)) > 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326) > at > org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34) > {noformat} > This is because no code initializes the Nondeterministic expressions before > eval is called on them. > This is a low impact issue, since it would require both whole-stage codegen > and predicate codegen to fail before FilterExec would fall back to using > InterpretedPredicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449352#comment-16449352 ] Bruce Robbins commented on SPARK-24043: --- You're half-way there. When whole-stage codegen is off (and only then), FilterExec requests code generation and compilation for the predicate. See FilterExec.doExecute (which calls SparkPlan.newPredicate, which attempts to generate code for the predicate). I couldn't find a configuration setting to turn off predicate codegen. > InterpretedPredicate.eval fails if expression tree contains Nondeterministic > expressions > > > Key: SPARK-24043 > URL: https://issues.apache.org/jira/browse/SPARK-24043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bruce Robbins >Priority: Minor > > When whole-stage codegen and predicate codegen both fail, FilterExec falls > back to using InterpretedPredicate. If the predicate's expression contains > any non-deterministic expressions, the evaluation throws an error: > {noformat} > scala> val df = Seq((1)).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: int] > scala> df.filter('a > 0).show // this works fine > 2018-04-21 20:39:26 WARN FilterExec:66 - Codegen disabled for this > expression: > (value#1 > 0) > +---+ > | a| > +---+ > | 1| > +---+ > scala> df.filter('a > rand(7)).show // this will throw an error > 2018-04-21 20:39:40 WARN FilterExec:66 - Codegen disabled for this > expression: > (cast(value#1 as double) > rand(7)) > 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326) > at > org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34) > {noformat} > This is because no code initializes the Nondeterministic expressions before > eval is called on them. > This is a low impact issue, since it would require both whole-stage codegen > and predicate codegen to fail before FilterExec would fall back to using > InterpretedPredicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449326#comment-16449326 ] Takeshi Yamamuro commented on SPARK-24043: -- I tried this with codegen=off; {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31) Type in expressions to have them evaluated. Type :help for more information. scala> sql("SET spark.sql.codegen.wholeStage=false") res0: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> Seq((1)).toDF("a").filter('a > rand(7)).show +---+ | a| +---+ | 1| +---+ {code} > InterpretedPredicate.eval fails if expression tree contains Nondeterministic > expressions > > > Key: SPARK-24043 > URL: https://issues.apache.org/jira/browse/SPARK-24043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bruce Robbins >Priority: Minor > > When whole-stage codegen and predicate codegen both fail, FilterExec falls > back to using InterpretedPredicate. If the predicate's expression contains > any non-deterministic expressions, the evaluation throws an error: > {noformat} > scala> val df = Seq((1)).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: int] > scala> df.filter('a > 0).show // this works fine > 2018-04-21 20:39:26 WARN FilterExec:66 - Codegen disabled for this > expression: > (value#1 > 0) > +---+ > | a| > +---+ > | 1| > +---+ > scala> df.filter('a > rand(7)).show // this will throw an error > 2018-04-21 20:39:40 WARN FilterExec:66 - Codegen disabled for this > expression: > (cast(value#1 as double) > rand(7)) > 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326) > at > org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34) > {noformat} > This is because no code initializes the Nondeterministic expressions before > eval is called on them. > This is a low impact issue, since it would require both whole-stage codegen > and predicate codegen to fail before FilterExec would fall back to using > InterpretedPredicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449281#comment-16449281 ] Bruce Robbins commented on SPARK-24043: --- [~maropu] > Do I miss any precondition? For this bug to materialize in spark-shell, Spark SQL needs to be interpreted mode (whole-stage codegen and predicate codegen are shut off). I ran some of the DataFrame and Dataset test suites in interpreted mode and this bug popped out (during the run for the test "handle nondeterministic expressions correctly for set operations"). To put Spark SQL in interpreted mode, I manually shut off whole-stage codegen and predicate codegen. It was still off when I did the above spark-shell demo. Outside of manually tweaking Spark, it's difficult to get predicate codegen to fail (It's easy to get whole-stage codegen to fall back – just supply more than 300 columns in your query. Predicate codegen is more resilient). That's why this is a low impact bug. However, at some point we might want to test interpreted mode. I will make a PR, but it's no emergency. To see the bug in action with Spark as-is, add these test cases to PredicateSuite. The first should succeed (no Nondeterministic expressions). The second will fail with an exception ("Nondeterministic expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized before eval"): {code:java} test("Interpreted Predicate should work without nondeterministic expressions") { val interpreted = InterpretedPredicate.create(LessThan(Literal(0.2), Literal(1.0))) interpreted.initialize(0) assert(interpreted.eval(new UnsafeRow())) } test("Interpreted Predicate should initialize nondeterministic expressions") { val interpreted = InterpretedPredicate.create(LessThan(Rand(7), Literal(1.0))) interpreted.initialize(0) assert(interpreted.eval(new UnsafeRow())) } {code} > InterpretedPredicate.eval fails if expression tree contains Nondeterministic > expressions > > > Key: SPARK-24043 > URL: https://issues.apache.org/jira/browse/SPARK-24043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bruce Robbins >Priority: Minor > > When whole-stage codegen and predicate codegen both fail, FilterExec falls > back to using InterpretedPredicate. If the predicate's expression contains > any non-deterministic expressions, the evaluation throws an error: > {noformat} > scala> val df = Seq((1)).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: int] > scala> df.filter('a > 0).show // this works fine > 2018-04-21 20:39:26 WARN FilterExec:66 - Codegen disabled for this > expression: > (value#1 > 0) > +---+ > | a| > +---+ > | 1| > +---+ > scala> df.filter('a > rand(7)).show // this will throw an error > 2018-04-21 20:39:40 WARN FilterExec:66 - Codegen disabled for this > expression: > (cast(value#1 as double) > rand(7)) > 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326) > at > org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34) > {noformat} > This is because no code initializes the Nondeterministic expressions before > eval is called on them. > This is a low impact issue, since it would require both whole-stage codegen > and predicate codegen to fail before FilterExec would fall back to using > InterpretedPredicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447614#comment-16447614 ] Takeshi Yamamuro commented on SPARK-24043: -- I tried this in the master and v2.3 though, the issue didn't happen there. Do I miss any precondition? > InterpretedPredicate.eval fails if expression tree contains Nondeterministic > expressions > > > Key: SPARK-24043 > URL: https://issues.apache.org/jira/browse/SPARK-24043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bruce Robbins >Priority: Minor > > When whole-stage codegen and predicate codegen both fail, FilterExec falls > back to using InterpretedPredicate. If the predicate's expression contains > any non-deterministic expressions, the evaluation throws an error: > {noformat} > scala> val df = Seq((1)).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: int] > scala> df.filter('a > 0).show // this works fine > 2018-04-21 20:39:26 WARN FilterExec:66 - Codegen disabled for this > expression: > (value#1 > 0) > +---+ > | a| > +---+ > | 1| > +---+ > scala> df.filter('a > rand(7)).show // this will throw an error > 2018-04-21 20:39:40 WARN FilterExec:66 - Codegen disabled for this > expression: > (cast(value#1 as double) > rand(7)) > 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326) > at > org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34) > {noformat} > This is because no code initializes the Nondeterministic expressions before > eval is called on them. > This is a low impact issue, since it would require both whole-stage codegen > and predicate codegen to fail before FilterExec would fall back to using > InterpretedPredicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org