[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Thank you for review and merging, @liancheng , @cloud-fan , @hvanhovell , and @naliazheli ! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-05 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14044 Merged to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14044 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14044 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61744/ Test PASSed. ---

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14044 **[Test build #61744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61744/consoleFull)** for PR 14044 at commit

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14044 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Now, I update the title and description of PR/JIRA. The only patch in this PR is the following one word change. ``` -new Dataset[Row](sparkSession, logicalPlan,

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14044 **[Test build #61744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61744/consoleFull)** for PR 14044 at commit

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Hi, @cloud-fan , @hvanhovell , @liancheng . According to @cloud-fan 's advice, after changing the following, it turns out that the difference is not noticeable. ``` -new

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Thank you for review, @liancheng . I'm sure that the performance of Analyzer need to be improved. But, in any cases, the cost of analyzer cannot be zero. We should skip the redundant

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14044 Agree with @hvanhovell. Analysis should never take so long a time for such a simple query. We should avoid duplicated analysis work, but fixing performance issue(s) within the analyzer seems to

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Yep. I agree. Could you make a PR for that? I think we also have some optimization points about that. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14044 `LogicalPlan.resolve(...)` uses a linear search to resolve a column. This is pretty bad if you are trying to lookup 4000 columns 4 times (filter, project, aggregate, sort): 4000 * (4000 / 2) * 4

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Interesting result. We definitely need to take a look at `ResolveReferences`-related stuff. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Oh, thank you for advice of `dumpTimeSpent`. I didn't look at in that way. In these days, I'm trying to investigate large queries situation. This analysis is very helpful for me. Thank

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14044 @dongjoon-hyun my point is that analysis should not be taking 12 seconds at all. You can see how much time is spent in a rule, if you add the following lines of code to your example:

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Thank you for review, @hvanhovell . BTW, it's over 12 seconds for one single analysis. Elapsed time: 25.787751452s --> Elapsed time: 12.364812255s. The reason I executed

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14044 Any idea what causes the regression? 5 seconds seems way too long for analysis... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Thank you for review, @naliazheli . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread naliazheli
Github user naliazheli commented on the issue: https://github.com/apache/spark/pull/14044 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14044 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61714/ Test PASSed. ---

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14044 **[Test build #61714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61714/consoleFull)** for PR 14044 at commit

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 cc @cloud-fan , too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14044 Hi, @liancheng and @rxin . Could you review this PR? This code path occurs during Dataset/Dataframe merging. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14044 **[Test build #61714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61714/consoleFull)** for PR 14044 at commit