[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220233909 @cloud-fan . Now, it's ready again. Could you merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13045#issuecomment-220233152 **[Test build #58850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58850/consoleFull)** for PR 13045 at commit [`9eb6f40`](https://github.com/apache/spark/commit/9eb6f4063adaf7cda79cdf0bf2ac11414ca5c1d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220233150 **[Test build #58848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58848/consoleFull)** for PR 13189 at commit [`8db358f`](https://github.com/apache/spark/commit/8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220233159 **[Test build #58849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58849/consoleFull)** for PR 13186 at commit [`ac3aa33`](https://github.com/apache/spark/commit/ac3aa334b59d430ea7c239c706ed7e490af5f0b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232881 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232882 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58845/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232874 **[Test build #58845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58845/consoleFull)** for PR 13173 at commit [`ca22d71`](https://github.com/apache/spark/commit/ca22d7102537bd7411f37aa957f877802ebd6d17). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220232797 cc @andrewor14 @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/13189 [SPARK-14670][SQL][WIP] allow updating driver side sql metrics ## What changes were proposed in this pull request? On the SparkUI right now we have this SQLTab that displays accumulator values per operator. However, it only displays metrics updated on the executors, not on the driver. It is useful to also include driver metrics, e.g. broadcast time. This is a different version from https://github.com/apache/spark/pull/12427. This PR sends driver side accumulator updates right after the updating happens, not at the end of execution. But it has some drawback: 1. If there is no update, we won't send zero value updates, and in web UI the operator will be empty, no metrics info in displayed. 2. We need to trigger the event explicitly, not as simply as just update the accumulator. 3. maybe hard to use it inside whole stage codegen. ## How was this patch tested? TODO (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13189 commit 8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb Author: Wenchen FanDate: 2016-05-19T05:36:34Z allow updating driver side sql metrics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220232622 **[Test build #58847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58847/consoleFull)** for PR 13186 at commit [`5bcef84`](https://github.com/apache/spark/commit/5bcef84700bd4ec51097e58bea099ded54334a59). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232037 **[Test build #58845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58845/consoleFull)** for PR 13173 at commit [`ca22d71`](https://github.com/apache/spark/commit/ca22d7102537bd7411f37aa957f877802ebd6d17). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13090#issuecomment-220232042 **[Test build #58846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58846/consoleFull)** for PR 13090 at commit [`698c261`](https://github.com/apache/spark/commit/698c2619dc71650ef0faac278014b539387fb273). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220231882 Doesn't seem to be a valid MiMA check failure. Actually the tool crashed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220231892 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220231390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220231329 **[Test build #58843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58843/consoleFull)** for PR 13139 at commit [`e0079d0`](https://github.com/apache/spark/commit/e0079d03f279dc68eb19faed6d5cb6823802051a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220231389 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220231132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220231131 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230990 **[Test build #58841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58841/consoleFull)** for PR 13122 at commit [`84aa14a`](https://github.com/apache/spark/commit/84aa14a5deda14083520e8e23f83cdb7f5bbb2bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220230863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58840/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220230862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220230908 **[Test build #58844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58844/consoleFull)** for PR 13188 at commit [`e584575`](https://github.com/apache/spark/commit/e584575bb786e77b7ea1d6de3f80ec556011d291). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13188 [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark queries for SparkSQL ## What changes were proposed in this pull request? Now that SparkSQL supports all TPC-DS queries, this patch adds all 99 benchmark queries inside SparkSQL. ## How was this patch tested? Benchmark only You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark tpcds-all Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13188 commit e584575bb786e77b7ea1d6de3f80ec556011d291 Author: Sameer AgarwalDate: 2016-05-03T00:28:12Z Add all TPCDS queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220230733 **[Test build #58840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58840/consoleFull)** for PR 13187 at commit [`9b07d09`](https://github.com/apache/spark/commit/9b07d09301e9c6695e3586e06852f679594d988d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-15078] Add all TPCDS 1.4 benchmark...
Github user sameeragarwal closed the pull request at: https://github.com/apache/spark/pull/12854 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58839/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220230302 **[Test build #58843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58843/consoleFull)** for PR 13139 at commit [`e0079d0`](https://github.com/apache/spark/commit/e0079d03f279dc68eb19faed6d5cb6823802051a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230146 **[Test build #58839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58839/consoleFull)** for PR 13122 at commit [`0702178`](https://github.com/apache/spark/commit/0702178a3c485aa316d5b03b3aefb2ea4a228cc2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220229725 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220229727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58842/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220229641 **[Test build #58842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58842/consoleFull)** for PR 13139 at commit [`ce7c55e`](https://github.com/apache/spark/commit/ce7c55e14a76dc85bca51a2563d770e3eac3a2a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r63823201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1648,16 +1648,56 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } /** + * Statically evaluate an expression containing one or more aggregates on an empty input. + */ + private def evalOnZeroTups(expr : Expression) : Option[Any] = { +// AggregateExpressions are Unevaluable, so we need to replace all aggregates +// in the expression with the value they would return for zero input tuples. +val rewrittenExpr = expr transform { + case a @ AggregateExpression(aggFunc, _, _, resultId) => +val resultLit = aggFunc.defaultResult match { + case Some(lit) => lit + case None => Literal.default(NullType) +} +Alias(resultLit, "aggVal") (exprId = resultId) +} +Option(rewrittenExpr.eval()) + } + + /** * Construct a new child plan by left joining the given subqueries to a base plan. */ private def constructLeftJoins( child: LogicalPlan, subqueries: ArrayBuffer[ScalarSubquery]): LogicalPlan = { subqueries.foldLeft(child) { case (currentChild, ScalarSubquery(query, conditions, _)) => +val aggOutputExpr = query.asInstanceOf[Aggregate].aggregateExpressions.head --- End diff -- Sorry, didn't see your reply before I posted mine. I must not have refreshed my browser. Thanks for the info on the possible cases. I'm testing the updated static evaluation code now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220228793 **[Test build #58842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58842/consoleFull)** for PR 13139 at commit [`ce7c55e`](https://github.com/apache/spark/commit/ce7c55e14a76dc85bca51a2563d770e3eac3a2a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220228661 @yanboliang @MLnick Thanks for the feedback. For now, I've just addressed the comment about the optimization section. I'll address the other comments in my next commit (very soon!). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r63823104 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,197 @@ regression model and extracting model summary statistics. +## Generalized linear regression + +When working with data that has a relatively small number of features (< 4096), Spark's GeneralizedLinearRegression interface +allows for flexible specification of [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, GLMs are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). + +$$ +Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right) +$$ + +An exponential family distribution is any probability distribution of the form + +$$ +f\left(y|\theta, \phi, w\right) = \exp{\left(\frac{y\theta - b(\theta)}{\phi/w} - c(y, \phi)\right)} +$$ + +where the parameter of interest $\theta_i$ is related to the expected value of the response variable +$\mu_i$ by + +$$ +\theta_i = h(\mu_i) +$$ + +Here, $h(\mu_i)$ is defined by the form of the exponential family distribution used. GLMs also allow specification +of a link function, which defines the relationship between the expected value of the response variable $\mu_i$ +and the so called _linear predictor_ $\eta_i$: + +$$ +g(\mu_i) = \eta_i = \vec{x_i}^T \cdot \vec{\beta} +$$ + +Often, the link function is chosen such that $h(\mu) = g(\mu)$, which yields a simplified relationship +between the parameter of interest $\theta$ and the linear predictor $\eta$. In this case, the link +function $g(\mu)$ is said to be the "canonical" link function. + +$$ +\theta_i = h(g^{-1}(\eta_i)) = \eta_i +$$ + +A GLM finds the regression coefficients $\vec{\beta}$ which maximize the likelihood function. + +$$ +\min_{\vec{\beta}} \mathcal{L}(\vec{\theta}|\vec{y},X) = +\prod_{i=1}^{N} \exp{\left(\frac{y_i\theta_i - b(\theta_i)}{\phi/w_i} - c(y_i, \phi)\right)} +$$ + +where the parameter of interest $\theta_i$ is related to the regression coefficients $\vec{\beta}$ +by + +$$ +\theta_i = h(g^{-1}(\vec{x_i} \cdot \vec{\beta})) +$$ + +Spark's generalized linear regression interface also provides summary statistics for diagnosing the +fit of GLM models, including residuals, p-values, deviances, the Akaike information criterion, and +others. + +### Available families + + + + + + PDF + Response Type + Supported Links + + + + Gaussian + $\frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{(x - \mu)^2}{2\sigma^2}\right)$ + Continuous + Identity*, Log, Inverse + + + Binomial + $\binom{n}{k}p^k (1-p)^{n-k}$ + Binary + Logit*, Probit, CLogLog + + + Poisson + $\frac{\lambda^k e^{-\lambda}}{k!}$ + Count + Log*, Identity, Sqrt + + + Gamma + $\frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}$ + Continuous + Inverse*, Idenity, Log + +* Canonical Link + + + +### Optimization --- End diff -- So, I went ahead and added some more detail on the optimization routine. I made an effort to stress the limitations on numFeatures and to give some explanation as to why. Could you take a look at it? I didn't generate the docs to make sure it looks alright just yet, but I wanted to get that up so it could be reviewed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220228251 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220228252 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58835/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220228173 **[Test build #58835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58835/consoleFull)** for PR 13186 at commit [`23b43d4`](https://github.com/apache/spark/commit/23b43d4c837d762461dd56a62b85cb998919e0ef). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822575 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- Looks like it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- yes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- Is `capacity` number of row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13167 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220226195 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/10061#discussion_r63822163 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -96,6 +100,7 @@ private[spark] object JsonProtocol { executorMetricsUpdateToJson(metricsUpdate) case blockUpdated: SparkListenerBlockUpdated => throw new MatchError(blockUpdated) // TODO(ekl) implement this + case _ => parse(mapper.writeValueAsString(event)) --- End diff -- > Events are a public API, and they should be carefully crafted, since changing them affects user applications (including event logs). If there is unnecessary information in the event, then it's a bug in the event definition, not here. Yea. I totally agree. However, my concern is that having this line at here will make the developer harder to spot issues during the development. Since the serialization works automatically, we are not making a self-review on what will be serialized and what methods will be called during serialization a mandatory step, which makes the auditing work much harder. Although it introduces more work to the developer to make every event explicitly handled, when we review the pull request, we can clearly know what will be serialized and how a event is serialized when a pull request is submitted. What do you think? btw, if I am missing any context, please let me know :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220225651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58836/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220225648 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220225586 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220225588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58837/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220225530 **[Test build #58836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58836/consoleFull)** for PR 13167 at commit [`a97e358`](https://github.com/apache/spark/commit/a97e3586b7b856d5a62981ff459f48da8d1128bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220225490 **[Test build #58837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58837/consoleFull)** for PR 12719 at commit [`0cb1136`](https://github.com/apache/spark/commit/0cb11361ff70d88ae09a4fd31154999fc9c3efae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-220224055 @sun-rui @felixcheung Let me try to build and run all tests for R first in Windows and then will try to correct and add each test one by one. This will take a bit of time and I might have to ask a lot of questions but anyway I will try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63820600 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220223044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58838/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220223043 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220222980 **[Test build #58838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58838/consoleFull)** for PR 13135 at commit [`9ec58e6`](https://github.com/apache/spark/commit/9ec58e6368d848b90b94145a1bb1354587898d82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13181#issuecomment-220222603 Hi @marmbrus , it seems okay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220222494 **[Test build #58840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58840/consoleFull)** for PR 13187 at commit [`9b07d09`](https://github.com/apache/spark/commit/9b07d09301e9c6695e3586e06852f679594d988d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220222493 **[Test build #58841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58841/consoleFull)** for PR 13122 at commit [`84aa14a`](https://github.com/apache/spark/commit/84aa14a5deda14083520e8e23f83cdb7f5bbb2bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63820108 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- This class is for the compatibility purpose. Let's leave it as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13187 [SPARK-15322][SQL][FOLLOW-UP] Update deprecated accumulator usage into accumulatorV2 ## What changes were proposed in this pull request? This PR corrects another case that uses deprecated `accumulableCollection` to use `listAccumulator`, which seems the previous PR missed. Since `ArrayBuffer[InternalRow]` is `java.util.List[InternalRow]`, it seems reasonable to replace the usage. ## How was this patch tested? Related existing tests `InMemoryColumnarQuerySuite` and `CachedTableSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-15322 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13187.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13187 commit 9b07d09301e9c6695e3586e06852f679594d988d Author: hyukjinkwonDate: 2016-05-19T03:50:37Z Use list accumulator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220222031 **[Test build #58838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58838/consoleFull)** for PR 13135 at commit [`9ec58e6`](https://github.com/apache/spark/commit/9ec58e6368d848b90b94145a1bb1354587898d82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220222027 **[Test build #58839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58839/consoleFull)** for PR 13122 at commit [`0702178`](https://github.com/apache/spark/commit/0702178a3c485aa316d5b03b3aefb2ea4a228cc2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13122#discussion_r63819835 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -234,6 +234,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { ) } + test("unsupported operations") { --- End diff -- @hvanhovell The latest changes added the test cases for the unsupported operations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15130][PySpark][ML][DOCS] pyspark expos...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12914#issuecomment-220219840 @jkbradley @yanboliang @holdenk @sethah let's discuss the issue of defaults in param doc (refer https://github.com/apache/spark/pull/13148#discussion_r63600571) on this PR since it is pertinent. Here, Holden raises 2 issues: 1. The Scaladoc contains default values for many params (sometimes in shared traits). In addition the Scala `Param` itself has the self-contained `doc` field (typically not containing defaults, since the built-in doc shows current and default in `explainParam`). 2. The PyDoc only contains the `Param` `doc` field. (By the way, (1) implies that in cases where the default param value in the trait is overridden, the Scaladoc is incorrect, but that is another issue). The result of (2) is that the HTML API doc doesn't look great, e.g. https://cloud.githubusercontent.com/assets/1036807/15381231/0a937dde-1d7e-11e6-885c-b120679f84ee.png;> Also, nowhere in the PyDoc are the defaults listed, while in the Scaladoc they are. I agree that it would be nice to have the defaults listed in the PyDoc in some way. 1. One solution is the original approach here, where defaults are put in the Param doc in a standard way, but stripped out during `explainParams`. This works but IMO is more prone to breaking in future if people forget to do things in exactly the correct format. It also doesn't directly solve the problem of the API doc looking ugly; 2. Another solution is the current approach here, where the attributes are turned into properties with a docstring (possibly including the default) - this does solve the problem of nice display in the API doc. The downside here is the potentially fairly large change to make everything a property, and the code duplication introduced (though kept to a minimum) and extra boilerplate when adding new params that could be more error-prone; 3. A third solution is what I've done [here](https://github.com/mlnick/spark/tree/sphinx-doc-params) as a PoC, which basically adds the built-in doc as the instance docstring for each Python `Param`. Then we override the `AttributeDocumenter` in Sphinx to handle it. The result displays nicely in the API doc (the same as the property approach, but no defaults are added). The other thing that changes is the `__init__` docstring is brought back (for some reason the current docs are not showing that), which means that the defaults are essentially documented there for each class. In a way this seems more "Pythonic" to me (i.e. Python users are accustomed to seeing the default arg values in constructer doc, e.g. sciki-learn). 4. Another option is to do nothing (for now at least), except bring back the `__init__` docstring. This keeps the ugly-looking `Param` doc, but at least shows the default args for each class, and is the current behavior. We can do something like (1) or (3) later (but maybe not (2) during Spark 2.x as it may be too large a change). 5. A final option is to perhaps document defaults elsewhere (such as the setter for the param which is usually implemented in the class or a model trait in Scala). Let's decide on an approach and make it consistent across the board. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user zzcclp commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220218127 ï¼ zsxwing will this pr be merged into branch 1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13185 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220217816 Didn't merge to 1.6 due to the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220217602 Thanks. Merging to master, 2.0 and 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217482 Oh, amazing. According to the last Jenkins results. The seven test failures in `catalyst` are all of them. ``` [info] *** 7 TESTS FAILED *** [error] Failed: Total 1656, Failed 7, Errors 0, Passed 1649, Ignored 1 [error] Failed tests: [error] org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite [error] org.apache.spark.sql.catalyst.expressions.CastSuite [error] (catalyst/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 222 s, completed May 18, 2016 8:11:07 PM ``` Anyway, I will handle them in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217398 **[Test build #58837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58837/consoleFull)** for PR 12719 at commit [`0cb1136`](https://github.com/apache/spark/commit/0cb11361ff70d88ae09a4fd31154999fc9c3efae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220217381 **[Test build #58835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58835/consoleFull)** for PR 13186 at commit [`23b43d4`](https://github.com/apache/spark/commit/23b43d4c837d762461dd56a62b85cb998919e0ef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220217395 **[Test build #58836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58836/consoleFull)** for PR 13167 at commit [`a97e358`](https://github.com/apache/spark/commit/a97e3586b7b856d5a62981ff459f48da8d1128bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63817417 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- if `invalidateTable` has different meaning than `refreshTable`, should we also add it to `HiveContext`? cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217295 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58834/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217294 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217222 **[Test build #58834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58834/consoleFull)** for PR 12719 at commit [`d8257ee`](https://github.com/apache/spark/commit/d8257eef75433fe25aa4fd9c8c387933f23cfd20). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217246 I removed the last test commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/13186 [SPARK-15397] [SQL] fix string udf locate as hive ## What changes were proposed in this pull request? in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, `locate("aa", "aaa", 0)` would yield 1, `locate("aa", "aaa", 1)` would yield 2 and `locate("aa", "aaa", 2)` would yield 0. This results from the different understanding of the third parameter in udf `locate`. It means the starting index and starts from 1, so when we use 0, the return would always be 0. ## How was this patch tested? tested with modified `StringExpressionsSuite` and `StringFunctionsSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark locate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13186.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13186 commit 23b43d4c837d762461dd56a62b85cb998919e0ef Author: Daoyuan WangDate: 2016-05-18T11:30:07Z fix string udf locate as hive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220216995 Thank you for understanding. I'll try to handle those test issues in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-220216386 Does this apply to other cases: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/R/pkg/inst/worker/daemon.R#L22 https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/R/pkg/inst/profile/shell.R#L20 https://github.com/apache/spark/blob/6ab4d9e0c76b69b4d6d5f39037a77bdfb042be19/examples/src/main/r/dataframe.R#L37 (last one is an example) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220216050 I'm fine to leave the `resolved` checking in this PR, because the test issue is kind of unrelated. But it will be good if we can send another PR to fix the test issue, it doesn't make sense to test evaluation of an unresolved expression, as it will never happen in real world. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC][MINOR] ml.feature Scala and Python API s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13159 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC][MINOR] ml.feature Scala and Python API s...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13159#issuecomment-220214631 LGTM, thanks @BryanCutler. Merged to master/branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63815948 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -72,9 +72,18 @@ case class BroadcastExchangeExec( val beforeCollect = System.nanoTime() // Note that we use .executeCollect() because we don't want to convert data to Scala types val input: Array[InternalRow] = child.executeCollect() +if (input.length >= (512 << 20)) { + throw new SparkException( +s"Cannot broadcast the table with more than 512 millions rows: ${input.length} rows") --- End diff -- Yes, it's not, will update them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220214210 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13167#discussion_r63815763 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -70,7 +94,7 @@ case class DeserializeToObject( */ case class SerializeFromObjectExec( serializer: Seq[NamedExpression], -child: SparkPlan) extends UnaryExecNode with CodegenSupport { +child: SparkPlan) extends UnaryExecNode with ObjectConsumerExec with CodegenSupport { --- End diff -- minor: ObjectConsumerExec already extend UnaryExecNode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13167#discussion_r63815785 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -166,10 +187,7 @@ case class MapElementsExec( func: AnyRef, outputObjAttr: Attribute, child: SparkPlan) - extends UnaryExecNode with ObjectOperator with CodegenSupport { - - override def output: Seq[Attribute] = outputObjAttr :: Nil - override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr) + extends UnaryExecNode with ObjectProducerExec with ObjectConsumerExec with CodegenSupport { --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13167#discussion_r63815797 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -141,15 +165,12 @@ case class MapPartitionsExec( func: Iterator[Any] => Iterator[Any], outputObjAttr: Attribute, child: SparkPlan) - extends UnaryExecNode with ObjectOperator { - - override def output: Seq[Attribute] = outputObjAttr :: Nil - override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr) + extends UnaryExecNode with ObjectProducerExec with ObjectConsumerExec { --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r63815687 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,7 +480,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case CausedBy(ex: Exception) if ex.getMessage.contains(s"$name does not exist") => --- End diff -- The objective here is not to catch all the exceptions but the ones caused by the function not existing. In my case, this exception is "org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:NoSuchObjectException(message:Function default.func does not exist))" whose root cause is MetaException, but it may vary in different situations (not really sure it varies, just conjecture based on previous code. See pr #12198 and #12853). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13163#issuecomment-220213832 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13163#issuecomment-220213834 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58832/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13163#issuecomment-220213672 **[Test build #58832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58832/consoleFull)** for PR 13163 at commit [`b257891`](https://github.com/apache/spark/commit/b257891583865af83559ddefd46d70bf627f88dd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220213423 Hmm. @cloud-fan . What about just using `resolved` checking simply? IMHO, it provides just robustness. And, in fact, I'm reluctant to change testsuite when adding new feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220212872 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220212875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58831/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13156#issuecomment-220212724 **[Test build #58831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58831/consoleFull)** for PR 13156 at commit [`2b773b8`](https://github.com/apache/spark/commit/2b773b823672199a685e765f5345ceb6584eb3d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEach ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220212769 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220212842 For the second suggestion, `the optimizer is not tested but skipped`, you mean skipping `FoldablePropagation` optimizer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org