[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 thanks for the review, merging to master/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15807 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68436/ Test PASSed. ---

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15807 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15807 **[Test build #68436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68436/consoleFull)** for PR 15807 at commit [`3315d10`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 @cloud-fan OK. I am looking at that issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 As the subexpression elimination problem may be hard to fix, I only adds regression test in this PR, we can remove the `ReferenceToExpressions` later. --- If your project is set up for it, you ca

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 As the subexpression elimination problem may be hard to fix, I only adds regression test in this PR, we can remove the `ReferenceToExpressions` later. --- If your project is set up for it, you ca

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15807 **[Test build #68436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68436/consoleFull)** for PR 15807 at commit [`3315d10`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 @cloud-fan Looks good for now. I will take a look and give it a try tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 +1 on @kiszk 's idea, the next problem is, the sub expr eval method may need local variables instead of input row `i`, but we can inline it at every place which need the result of subexpression, e

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15807 For wholestage codegen, I think that a life time of sub-expressions is within an iteration for a row. Thus, `isInitialized` and `subExpr1` should be initialized at the beginning of each iteration. For

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 even we modify it to hold the results of subexpressions in member variables, the above code example should not work under wholestage codegen. The above code example is similar to non wholesta

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 why whole stage codegen can't use member variables to keep the result of subexpression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as w

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 > isn't the result of subexpression kept in member variables? For non-wholestage codegen, yes. For wholestage codegen, no. --- If your project is set up for it, you can reply to this email a

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 isn't the result of subexpression kept in member variables? What I am talking about is something like: ``` private boolean isInitialized = false; private Int subExpr1 = 0; priv

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 E.g., if (isNull(subexpr)) { ... } else { AssertNotNull(subexpr) // subexpr2, first used. SomeExpr(AssertNotNull(subexpr)) // SomeExpr

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 > we can't access the subexpression outside later I don't quite understand it, can you give an example? --- If your project is set up for it, you can reply to this email and have your rep

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 @cloud-fan Then once the first expression to use the subexpression is in a if/else branch, we can't access the subexpression outside later. Evaluate it again? --- If your project is set up for it,

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 can we just evaluate subexpression like a scala lazy val? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15807 @viirya @cloud-fan It looks reasonable to me that to skip subexpression elimination for the expressions wrapped in condition expressions such as `if`. This is because we have only a place at top level

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 @cloud-fan @kiszk I would propose to skip subexpression elimination for the expressions wrapped in condition expressions such as `If`. What do you think? --- If your project is set up for it, you ca

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 If we only evaluate the subexpressions before they are used. Wouldn't it cause more than once evaluation? E.g., if (isNull(subexpr)) { ... } else { As

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 `subexpr2` is a special case as it is an `AssertNotNull` which is a `NonSQLExpression` and can throw an exception at runtime. If I understand correctly, a SQL expression should not throw an exception

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15807 In general, in addtion to bad performance, it may lead to an incorrect result if `subexpr2` is always evaluated. When `subexpr` is null, to evaluate `subexpr2` may throw an exception. A compiler

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 @cloud-fan Yeah, looks like it is possibly the case. My first thought is seems not hard to solve this. I will look at this tomorrow. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 @viirya sorry I forgot one line code in the example. When `AsertNotNull(subexpr)` is also a subexpression, we will always evaluate it. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15807 Not sure if this example is a real one or not. But looks like `subexpr` is the subexpression, not `assertNotNull(subexpr)`. Why ``assertNotNull(subexpr)` will be always evaluated? --- If your proje

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 There is probably a bug of common subexpression elimination: we will evalute all subexpressions at the very beginning, no matter the results of subexpressions will be used or not. A counter exampl

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15807 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15807 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68334/ Test FAILed. ---

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15807 **[Test build #68334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68334/consoleFull)** for PR 15807 at commit [`0103bb4`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15807 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15807 **[Test build #68331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68331/consoleFull)** for PR 15807 at commit [`3917630`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15807 **[Test build #68334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68334/consoleFull)** for PR 15807 at commit [`0103bb4`](https://github.com/apache/spark/commit/0

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15807 Sorry I haven't noticed that https://github.com/apache/spark/pull/15693 was merged. Then this PR becomes a cleanup, not a bug fix. But I'd like to keep the regression test as it's from another JIR

[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15807 **[Test build #68331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68331/consoleFull)** for PR 15807 at commit [`3917630`](https://github.com/apache/spark/commit/3