[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 Thanks! Merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69965/ Test PASSed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69965/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16193 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69965/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16193 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16193 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69961/ Test FAILed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69961/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 Pushing down predicates into data source is also during optimization in planner, I think this one is not the first that do optimization outside Optimizer. --- If your project is set up for it, you

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 The reason we move the PythonUDFEvaluator from logical plan into physical plan, because this one-off break many things, many rules need to treat specially. --- If your project is set up for it,

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16193 If we add a logical node for python evaluator, we'd push down the Filter through it, so the optimizer rule won't combine two Filter into one again? --- If your project is set up for it, you can

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 @cloud-fan It's not trivial to do this in optimizer, for example, we should split one Filter into two, that will conflict with another optimizer rule, that combine two filter into one. --- If your

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16193 It's a little hacky to me that we do optimization in a planner. How hard is it if we introduce a logical node for python evaluator? We can define an interface in catalyst, e.g.

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69935/ Test PASSed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69935/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69935/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69931/ Test FAILed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69931/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 @viirya I think your idea is trying to resolve a different issue. It does not apply to all the cases for PythonUDF pushdown. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 @cloud-fan If the functions in `dapply` and `gapply` might be called as UDF in SparkR, we have very limited support. In the plan output, it is represented as `MapPartitionsInR` and

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69931/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 @davies Just updated the code comments, as you suggested. It does not affect the code logics. Sorry for the late update. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 If no objection in next two hours, I will merge this one into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 @cloud-fan There is no R UDF at this point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16193 is this a general problem? Does it apply to R UDF? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69891/ Test PASSed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69891/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16193 @gatorsmile I do mean that we don't push down predicates into `ExistingRDD` scan as it doesn't help generally. But `BatchEvalPython` is a special case. It doesn't matter `ExistingRDD` is the

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69891/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69878/ Test PASSed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69878/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69878/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 ExistingRDD might not be always the child of `Filter`. For example, ```Python >>> sel = df.select('key', 'value', rand()).filter((my_filter(col("key"))) & (df.value < "2")) ```

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 @viirya I did not get your points. Why pushing down predicates through Python UDF does not have significant benefit? Based on my understanding, it could greatly reduce the number of rows

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16193 If we really want to do this, I'd suggest to push down predicates to rdd scan node during query planning stage. So we don't need to push down predicates to SparkPlan like this. --- If your project

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16193 As we can push down predicates through data source scan, those predicates should be already pushed down if they are in the query plan above data source scan node. This seems only work on ExistingRDD

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69850/ Test FAILed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69850/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16193 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69808/ Test FAILed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69808/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69808/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 I also checked the plan of our 1.6.3 branch. The filter is not appropriately pushed down, even if we have the logical node `EvaluatePython`. ``` == Parsed Logical Plan == 'Filter

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 https://github.com/apache/spark/pull/12127 dropped the node `EvaluatePython `. Based on the PR description, we removed the node for the following reasons: >Currently we extract Python

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 @cloud-fan Let me do a history search and see why we dropped the logical plan node `EvaluatePython` --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69787/ Test PASSed. ---

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69787/consoleFull)** for PR 16193 at commit

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16193 Would it be easier if we create a logical node for python evaluator? We do have one in Spark 1.6 but get removed in 2.0, not sure why --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16193 cc @cloud-fan @davies @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69787/consoleFull)** for PR 16193 at commit