Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Thank you for review and merging, @liancheng , @cloud-fan , @hvanhovell ,
and @naliazheli !
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14044
Merged to master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/14044
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14044
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14044
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61744/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14044
**[Test build #61744 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61744/consoleFull)**
for PR 14044 at commit
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14044
LGTM pending Jenkins.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Now, I update the title and description of PR/JIRA.
The only patch in this PR is the following one word change.
```
-new Dataset[Row](sparkSession, logicalPlan,
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14044
**[Test build #61744 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61744/consoleFull)**
for PR 14044 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Hi, @cloud-fan , @hvanhovell , @liancheng .
According to @cloud-fan 's advice, after changing the following, it turns
out that the difference is not noticeable.
```
-new
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Thank you for review, @liancheng .
I'm sure that the performance of Analyzer need to be improved. But, in any
cases, the cost of analyzer cannot be zero.
We should skip the redundant
Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/14044
Agree with @hvanhovell. Analysis should never take so long a time for such
a simple query. We should avoid duplicated analysis work, but fixing
performance issue(s) within the analyzer seems to
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Yep. I agree.
Could you make a PR for that? I think we also have some optimization points
about that.
---
If your project is set up for it, you can reply to this email and have your
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14044
`LogicalPlan.resolve(...)` uses a linear search to resolve a column. This
is pretty bad if you are trying to lookup 4000 columns 4 times (filter,
project, aggregate, sort): 4000 * (4000 / 2) * 4
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Interesting result. We definitely need to take a look at
`ResolveReferences`-related stuff.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Oh, thank you for advice of `dumpTimeSpent`. I didn't look at in that way.
In these days, I'm trying to investigate large queries situation.
This analysis is very helpful for me. Thank
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14044
@dongjoon-hyun my point is that analysis should not be taking 12 seconds at
all. You can see how much time is spent in a rule, if you add the following
lines of code to your example:
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Thank you for review, @hvanhovell . BTW, it's over 12 seconds for one
single analysis.
Elapsed time: 25.787751452s --> Elapsed time: 12.364812255s.
The reason I executed
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14044
Any idea what causes the regression? 5 seconds seems way too long for
analysis...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Thank you for review, @naliazheli .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user naliazheli commented on the issue:
https://github.com/apache/spark/pull/14044
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14044
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14044
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61714/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14044
**[Test build #61714 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61714/consoleFull)**
for PR 14044 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
cc @cloud-fan , too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14044
Hi, @liancheng and @rxin .
Could you review this PR?
This code path occurs during Dataset/Dataframe merging.
---
If your project is set up for it, you can reply to this email and have
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14044
**[Test build #61714 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61714/consoleFull)**
for PR 14044 at commit
27 matches
Mail list logo