Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
Thanks! Merging to master!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69965/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69965 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69965/consoleFull)**
for PR 16193 at commit
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16193
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69965 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69965/consoleFull)**
for PR 16193 at commit
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16193
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16193
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69961/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69961 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69961/consoleFull)**
for PR 16193 at commit
Github user davies commented on the issue:
https://github.com/apache/spark/pull/16193
Pushing down predicates into data source is also during optimization in
planner, I think this one is not the first that do optimization outside
Optimizer.
---
If your project is set up for it, you
Github user davies commented on the issue:
https://github.com/apache/spark/pull/16193
The reason we move the PythonUDFEvaluator from logical plan into physical
plan, because this one-off break many things, many rules need to treat
specially.
---
If your project is set up for it,
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16193
If we add a logical node for python evaluator, we'd push down the Filter
through it, so the optimizer rule won't combine two Filter into one again?
---
If your project is set up for it, you can
Github user davies commented on the issue:
https://github.com/apache/spark/pull/16193
@cloud-fan It's not trivial to do this in optimizer, for example, we should
split one Filter into two, that will conflict with another optimizer rule, that
combine two filter into one.
---
If your
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16193
It's a little hacky to me that we do optimization in a planner. How hard is
it if we introduce a logical node for python evaluator? We can define an
interface in catalyst, e.g.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69935/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69935 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69935/consoleFull)**
for PR 16193 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69935 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69935/consoleFull)**
for PR 16193 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69931/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69931 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69931/consoleFull)**
for PR 16193 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
@viirya I think your idea is trying to resolve a different issue. It does
not apply to all the cases for PythonUDF pushdown.
---
If your project is set up for it, you can reply to this email
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
@cloud-fan If the functions in `dapply` and `gapply` might be called as UDF
in SparkR, we have very limited support. In the plan output, it is represented
as `MapPartitionsInR` and
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69931 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69931/consoleFull)**
for PR 16193 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
@davies Just updated the code comments, as you suggested. It does not
affect the code logics. Sorry for the late update.
---
If your project is set up for it, you can reply to this email and
Github user davies commented on the issue:
https://github.com/apache/spark/pull/16193
If no objection in next two hours, I will merge this one into master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user davies commented on the issue:
https://github.com/apache/spark/pull/16193
@cloud-fan There is no R UDF at this point.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16193
is this a general problem? Does it apply to R UDF?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69891/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69891 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69891/consoleFull)**
for PR 16193 at commit
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16193
@gatorsmile I do mean that we don't push down predicates into `ExistingRDD`
scan as it doesn't help generally. But `BatchEvalPython` is a special case.
It doesn't matter `ExistingRDD` is the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69891 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69891/consoleFull)**
for PR 16193 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69878/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69878 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69878/consoleFull)**
for PR 16193 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69878 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69878/consoleFull)**
for PR 16193 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
ExistingRDD might not be always the child of `Filter`. For example,
```Python
>>> sel = df.select('key', 'value', rand()).filter((my_filter(col("key")))
& (df.value < "2"))
```
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
@viirya I did not get your points. Why pushing down predicates through
Python UDF does not have significant benefit? Based on my understanding, it
could greatly reduce the number of rows
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16193
If we really want to do this, I'd suggest to push down predicates to rdd
scan node during query planning stage. So we don't need to push down predicates
to SparkPlan like this.
---
If your project
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16193
As we can push down predicates through data source scan, those predicates
should be already pushed down if they are in the query plan above data source
scan node. This seems only work on ExistingRDD
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69850/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69850 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69850/consoleFull)**
for PR 16193 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/16193
Retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69808/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69808 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69808/consoleFull)**
for PR 16193 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69808 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69808/consoleFull)**
for PR 16193 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
I also checked the plan of our 1.6.3 branch. The filter is not
appropriately pushed down, even if we have the logical node `EvaluatePython`.
```
== Parsed Logical Plan ==
'Filter
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
https://github.com/apache/spark/pull/12127 dropped the node `EvaluatePython
`. Based on the PR description, we removed the node for the following reasons:
>Currently we extract Python
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
@cloud-fan Let me do a history search and see why we dropped the logical
plan node `EvaluatePython`
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69787/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16193
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69787 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69787/consoleFull)**
for PR 16193 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16193
Would it be easier if we create a logical node for python evaluator? We do
have one in Spark 1.6 but get removed in 2.0, not sure why
---
If your project is set up for it, you can reply to this
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16193
cc @cloud-fan @davies @liancheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16193
**[Test build #69787 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69787/consoleFull)**
for PR 16193 at commit
61 matches
Mail list logo