[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-14 Thread saucam
Github user saucam closed the pull request at: https://github.com/apache/spark/pull/4764 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-14 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-92803267 thanks @marmbrus . Let me refactor this then and open another PR later. --- If your project is set up for it, you can reply to this email and have your reply appear on Git

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-14 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-92803029 thanks @marmbrus . Let me refactor this then and open another PR later. --- If your project is set up for it, you can reply to this email and have your reply appear on Git

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-13 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-92509606 Here is the JIRA: SPARK-4366. Unless you think you will have something in the next day or two, would you mind closing this JIRA. I'd like to keep the PR queue to only

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-11 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-91997683 hi @marmbrus , can you share other plans of modifying aggregates that you mentioned earlier? Can I help with that ? Otherwise i'll modify this one for now as you have sugg

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-08 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-91093315 As a very rough sketch (this is totally untested and I'm probably missing cases), I'd hope the solution could look something like the following: ```scala obj

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-08 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-91089650 Thanks for working ont his and sorry for the delay in reviewing it. My high level feedback is that I think we should optimize handling of distinct aggregation, but ther

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89769240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-05 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89756295 fixed test failures because of class cast exceptions. Please retest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89604947 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-04 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89604768 fixed the test case of zero count when there is no data. rebased with latest master. please retest --- If your project is set up for it, you can reply to this email and h

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89107886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89107882 [Test build #29635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29635/consoleFull) for PR 4764 at commit [`6883b42`](https://gith

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89091680 [Test build #29635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29635/consoleFull) for PR 4764 at commit [`6883b42`](https://githu

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-04-02 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-89091365 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-03-05 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-77510275 please restest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-26 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76347215 please retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-26 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76184234 Fixed the null count test failure. Optimization works only in case of single count distinct in select clause --- If your project is set up for it, you can reply to this e

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76149044 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76149037 [Test build #27994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27994/consoleFull) for PR 4764 at commit [`edee0d2`](https://gith

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76147388 [Test build #27994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27994/consoleFull) for PR 4764 at commit [`edee0d2`](https://githu

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-26 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76147005 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-25 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76135270 can we test this again please ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76065343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76065331 [Test build #27960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27960/consoleFull) for PR 4764 at commit [`4125e2e`](https://gith

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76062868 [Test build #27960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27960/consoleFull) for PR 4764 at commit [`4125e2e`](https://githu

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-25 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-76062074 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featu

[GitHub] spark pull request: [SPARK-6006][SQL]: Optimize count distinct for...

2015-02-25 Thread saucam
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/4764#issuecomment-75952342 @marmbrus can you please guide how to rewrite this in a better way ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu