[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16760 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72602/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16760 **[Test build #72602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72602/testReport)** for PR 16760 at commit [`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16760 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #72597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72597/testReport)** for PR 16858 at commit [`03f9bfd`](https://github.com/apache/spark/commit/03f9bfd985a5a272d99258971fd83e4613e9b0fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16760 **[Test build #72602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72602/testReport)** for PR 16760 at commit [`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16760 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16844#discussion_r100170698 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -695,11 +690,16 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff assert (vlen % 8 == 0); assert (longArray != null); - if (numKeys == MAX_CAPACITY -// The map could be reused from last spill (because of no enough memory to grow), -// then we don't try to grow again if hit the `growthThreshold`. -|| !canGrowArray && numKeys > growthThreshold) { -return false; + if (numKeys >= growthThreshold) { +if (longArray.size() / 2 == MAX_CAPACITY) { --- End diff -- This does not look correct as per documentation of MAX_CAPACITY. Actual number of keys == MAX_CAPACITY (so that total number of entries in longArray is MAX_CAPACITY * 2) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16844#discussion_r100170278 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -695,11 +690,16 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff assert (vlen % 8 == 0); assert (longArray != null); - if (numKeys == MAX_CAPACITY -// The map could be reused from last spill (because of no enough memory to grow), -// then we don't try to grow again if hit the `growthThreshold`. -|| !canGrowArray && numKeys > growthThreshold) { -return false; + if (numKeys >= growthThreshold) { +if (longArray.size() / 2 == MAX_CAPACITY) { + // Should not grow beyond the max capacity + return false; +} +try { + growAndRehash(); +} catch (OutOfMemoryError oom) { + return false; --- End diff -- Unrelated, but this OutOfMemoryError will not be useful - atleast not in yarn mode. It will simply cause the jvm to exit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16858 Oh, wait a moment. The Jenkins on last commit will succeed soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16858 OK merging this as it's a fix for the build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72601/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #72601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72601/testReport)** for PR 16386 at commit [`6f8b0c3`](https://github.com/apache/spark/commit/6f8b0c3249729aeecc5cf1275e29afe535cc7238). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16857 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16857 **[Test build #72600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72600/testReport)** for PR 16857 at commit [`b2523b9`](https://github.com/apache/spark/commit/b2523b920de2329878a37f7efc1e9dda5d969b79). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16857 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72600/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 I rebased to master and hopefully addressed all of your comments @cloud-fan, please have another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #72601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72601/testReport)** for PR 16386 at commit [`6f8b0c3`](https://github.com/apache/spark/commit/6f8b0c3249729aeecc5cf1275e29afe535cc7238). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16857 **[Test build #72600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72600/testReport)** for PR 16857 at commit [`b2523b9`](https://github.com/apache/spark/commit/b2523b920de2329878a37f7efc1e9dda5d969b79). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16857 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16858 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72595/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #72595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/testReport)** for PR 16858 at commit [`b8625a2`](https://github.com/apache/spark/commit/b8625a2b9e95c9c818c8f781d1ed839d24e617eb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72596/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven][test-hadoop2.6]Avoid using Ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16859 **[Test build #72599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72599/testReport)** for PR 16859 at commit [`1c88474`](https://github.com/apache/spark/commit/1c8847494c29d4b51182ecfeebb5cc85e000e7a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 Oh, indeed. Thank you for informing that, too! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16858 The only failure is irrelevant to this PR, ad also the very next Jenkins pass the test. ``` [info] KafkaSourceSuite: ... [info] - subscribing topic by pattern with topic deletions *** FAILED *** (1 minute, 2 seconds) [info] Timed out waiting for stream: The code passed to failAfter did not complete within 30 seconds. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven][test-hadoop2.6]Avoid using Ex...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16859 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16795 FYI, I'm fixing OOM in #16825 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #3567 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3567/testReport)** for PR 16858 at commit [`03f9bfd`](https://github.com/apache/spark/commit/03f9bfd985a5a272d99258971fd83e4613e9b0fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16858 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/12004 I still don't think this answered my last questions? yes, I understand all this back story. That's why this is taking such a large amount of everyone's time. The purpose and discussion and commits keep shifting significantly so I have to re-read this from the start every time to see what it's done this time. I still only partly perceive the problem and why it takes this much to solve it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #72596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72596/testReport)** for PR 16858 at commit [`e9d720e`](https://github.com/apache/spark/commit/e9d720e9585cf7e9c1e9cf1e633edc84f06196ae). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Still waiting reviews for this. Anyone? Ideally before my forthcoming Spark Summit talk... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 Oh, sorry! I misread your comment, `doubt`. For the ReplSuite, #16859 looks related. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven]Avoid using ExecutorClassLoade...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16859 Hi, @zsxwing . Currently, SparkPullRequestBuilder is broken and HOTFIX is #16858 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16795 Are you asking me? I meant Spark 2.1. Those test errors are not new, that's what I'm saying. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven]Avoid using ExecutorClassLoade...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16859 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72598/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven]Avoid using ExecutorClassLoade...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16859 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven]Avoid using ExecutorClassLoade...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16859 **[Test build #72598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72598/testReport)** for PR 16859 at commit [`1c88474`](https://github.com/apache/spark/commit/1c8847494c29d4b51182ecfeebb5cc85e000e7a1). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class TransportChannelHandler extends ChannelInboundHandlerAdapter ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 You mean `avro` or prior `parquet`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16795 > So far, ReplSuite failures are observed. One is OOM... I ran into those issues while testing the 2.1 RCs, so I doubt it's caused by your change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven]Avoid using ExecutorClassLoade...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16859 **[Test build #72598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72598/testReport)** for PR 16859 at commit [`1c88474`](https://github.com/apache/spark/commit/1c8847494c29d4b51182ecfeebb5cc85e000e7a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 So far, **ReplSuite** failures are observed. One is OOM and the other is the other is ClosureCleaner. Both happens after the testcase, `define case class and create Dataset together with paste mode`. It seems there is some flakiness on that suite. I'll take a closer look for that suite. hadoop-2.7 ``` ReplSuite: ... - define case class and create Dataset together with paste mode Spark context available as 'sc' (master = local-cluster[1,4,4096], app id = app-20170208112317-). Spark session available as 'spark'. Exception in thread "ExecutorRunner for app-20170208112317-/94" java.lang.OutOfMemoryError: Java heap space ``` hadoop-2.6 ``` ReplSuite: ... - define case class and create Dataset together with paste mode Spark context available as 'sc' (master = local-cluster[1,4,4096], app id = app-20170208112342-). Spark session available as 'spark'. - should clone and clean line object in ClosureCleaner *** FAILED *** isContain was true Interpreter output contained 'Exception': ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16859: [SPARK-17714][Core][maven]Avoid using ExecutorCla...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16859#discussion_r100159293 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java --- @@ -48,7 +47,7 @@ * on the channel for at least `requestTimeoutMs`. Note that this is duplex traffic; we will not * timeout if the client is continuously sending but getting no responses, for simplicity. */ -public class TransportChannelHandler extends SimpleChannelInboundHandler { +public class TransportChannelHandler extends ChannelInboundHandlerAdapter { --- End diff -- SimpleChannelInboundHandler also uses Javassist to generate a matcher class. Since `SimpleChannelInboundHandler` provides little value for us, I just changed to extend `ChannelInboundHandlerAdapter` directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16859: [SPARK-17714][Core][maven]Avoid using ExecutorCla...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16859#discussion_r100159049 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2599,14 +2599,10 @@ private[spark] object Utils extends Logging { private[util] object CallerContext extends Logging { val callerContextSupported: Boolean = { -SparkHadoopUtil.get.conf.getBoolean("hadoop.caller.context.enabled", false) && { +SparkHadoopUtil.get.conf.getBoolean("hadoop.caller.context.enabled", true) && { --- End diff -- I will change default value to `false` later. Just to test if this PR fixes the issue on Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16859: [SPARK-17714][Core][maven]Avoid using ExecutorCla...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16859 [SPARK-17714][Core][maven]Avoid using ExecutorClassLoader to load Netty generated classes ## What changes were proposed in this pull request? Netty's `MessageToMessageEncoder` uses Javassist to generate a matcher class and the implementation calls `Class.forName` to check if this calls is already generated. If `MessageEncoder` or `MessageDecoder` is created in `ExecutorClassLoader.findClass`, it will cause `ClassCircularityError`. This is because loading this Netty generated class will call `ExecutorClassLoader.findClass` to search this class, and `ExecutorClassLoader` will try to use RPC to load it and cause to load the non-exist matcher class again. JVM will report `ClassCircularityError` to prevent such infinite recursion. # Why it only happens in Maven builds It's because the Maven build will set a URLClassLoader as the current context class loader to run the tests and expose this issue. The class loader tree is as following: ``` bootstrap class loader -- ... - REPL class loader ExecutorClassLoader | | URLClasssLoader ``` The SBT build uses the bootstrap class loader directly and `ReplSuite.test("propagation of local properties")` is the first test in ReplSuite, which happens to load `io/netty/util/internal/__matchers__/org/apache/spark/network/protocol/MessageMatcher` into the bootstrap class loader. This issue can be reproduced in SBT as well. Here are the produce steps: - Enable `hadoop.caller.context.enabled`. - Replace `Class.forName` with `Utils.classForName` in `object CallerContext`. - Ignore `ReplSuite.test("propagation of local properties")`. - Run `ReplSuite`. This PR just creates a singleton MessageEncoder and MessageDecoder and makes sure they can be created before switching to ExecutorClassLoader. TransportContext will be created when creating RpcEnv and that happens before creating ExecutorClassLoader. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-17714 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16859.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16859 commit 1c8847494c29d4b51182ecfeebb5cc85e000e7a1 Author: Shixiong Zhu Date: 2017-02-07T22:30:42Z Avoid using ExecutorClassLoader to load Netty generated classes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16850 Merged. Could you close this PR, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16850 LGTM. Merging to 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16740 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16495 Thanks @mhmoudr As far as the stress test, I'd recommend posting instructions as a Github gist and linking it to wherever you post results on JIRA or a PR. We wouldn't want to add a model (a binary file) to git unless absolutely necessary. When we address the complexity issue, we can post stress test results and the link to the gist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100138230 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +if (res != null) { + res.head +} else { + null +} } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `DataFrameStatsFunctions.approxQuantile` for detailed description. * - * Note that rows containing any null or NaN values values will be removed before - * calculation. * @param cols the names of the numerical columns * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (>= 0). + * @param relativeError The relative target precision to achieve (greater or equal to 0). --- End diff -- "greater" -> "greater than" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100138241 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +if (res != null) { + res.head +} else { + null --- End diff -- The Scaladoc should describe this case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100138206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +if (res != null) { + res.head +} else { + null +} } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `DataFrameStatsFunctions.approxQuantile` for detailed description. * - * Note that rows containing any null or NaN values values will be removed before - * calculation. * @param cols the names of the numerical columns * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (>= 0). + * @param relativeError The relative target precision to achieve (greater or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities of each column * - * @note Rows containing any NaN values will be removed before calculation + * @note Rows containing any null or NaN values will be removed before calculation * * @since 2.2.0 */ def approxQuantile( cols: Array[String], probabilities: Array[Double], relativeError: Double): Array[Array[Double]] = { -StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, - probabilities, relativeError).map(_.toArray).toArray +try { + StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, --- End diff -- Great catch! I vote for modifying multipleApproxQuantiles to handle null and NaN values. As far as reverting, I'm OK either way as long as we get the fix into 2.2. I'd actually recommend going ahead and merging this PR and creating a follow-up Critical Bug targeted at 2.2. @MLnick I think dropping NAs from the cols passed as args still will not work. Say the user passes cols "a" and "b" as args, but some rows have (a = NaN, b = 1.0). Then those rows will be ignored. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #72597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72597/testReport)** for PR 16858 at commit [`03f9bfd`](https://github.com/apache/spark/commit/03f9bfd985a5a272d99258971fd83e4613e9b0fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16858 Thank you for review, @srowen . Also, since it touches R now, cc @felixcheung . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16740 LGTM Merging with master Thank you + @sethah for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16858 I removed those lines, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100137039 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -232,9 +232,9 @@ test_that("basenameSansExtFromUrl", { x <- paste0("http://people.apache.org/~pwendell/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-";, "SNAPSHOT-2016_12_09_11_08-eb2d9bf-bin/spark-2.1.1-SNAPSHOT-bin-hadoop2.7.tgz") y <- paste0("http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/spark-2.1.0-";, - "bin-hadoop2.4-without-hive.tgz") + "bin-hadoop2.6-without-hive.tgz") --- End diff -- Oh, I see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100136174 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -232,9 +232,9 @@ test_that("basenameSansExtFromUrl", { x <- paste0("http://people.apache.org/~pwendell/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-";, "SNAPSHOT-2016_12_09_11_08-eb2d9bf-bin/spark-2.1.1-SNAPSHOT-bin-hadoop2.7.tgz") y <- paste0("http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/spark-2.1.0-";, - "bin-hadoop2.4-without-hive.tgz") + "bin-hadoop2.6-without-hive.tgz") --- End diff -- Actually there is no such artifact anymore. Just remove this line and the one below that also referenced it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #72596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72596/testReport)** for PR 16858 at commit [`e9d720e`](https://github.com/apache/spark/commit/e9d720e9585cf7e9c1e9cf1e633edc84f06196ae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16844 **[Test build #3566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3566/testReport)** for PR 16844 at commit [`d9aa208`](https://github.com/apache/spark/commit/d9aa2081c514577399ba77cfe2145a00ed477ef8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100135299 --- Diff: dev/run-tests-jenkins.py --- @@ -165,12 +165,6 @@ def main(): if "test-maven" in ghprb_pull_title: os.environ["AMPLAB_JENKINS_BUILD_TOOL"] = "maven" # Switch the Hadoop profile based on the PR title: --- End diff -- I fixed that too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16844 **[Test build #3566 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3566/testReport)** for PR 16844 at commit [`d9aa208`](https://github.com/apache/spark/commit/d9aa2081c514577399ba77cfe2145a00ed477ef8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100134776 --- Diff: dev/run-tests-jenkins.py --- @@ -165,12 +165,6 @@ def main(): if "test-maven" in ghprb_pull_title: os.environ["AMPLAB_JENKINS_BUILD_TOOL"] = "maven" # Switch the Hadoop profile based on the PR title: --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100134410 --- Diff: dev/run-tests.py --- @@ -505,14 +505,14 @@ def main(): # if we're on the Amplab Jenkins build servers setup variables # to reflect the environment settings build_tool = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "sbt") -hadoop_version = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "hadoop2.3") +hadoop_version = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "hadoop2.6") test_env = "amplab_jenkins" # add path for Python3 in Jenkins if we're calling from a Jenkins machine os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH") else: # else we're running locally and can use local settings build_tool = "sbt" -hadoop_version = os.environ.get("HADOOP_PROFILE", "hadoop2.3") --- End diff -- Here, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100134493 --- Diff: dev/run-tests-jenkins.py --- @@ -165,12 +165,6 @@ def main(): if "test-maven" in ghprb_pull_title: os.environ["AMPLAB_JENKINS_BUILD_TOOL"] = "maven" # Switch the Hadoop profile based on the PR title: --- End diff -- BTW while you're at it, I spotted one more thing that we could remove although it probably won't matter at all. Look for `expect_equal(basenameSansExtFromUrl(y), "spark-2.1.0-bin-hadoop2.4-without-hive")` in `test_utils.R` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100134388 --- Diff: dev/run-tests.py --- @@ -505,14 +505,14 @@ def main(): # if we're on the Amplab Jenkins build servers setup variables # to reflect the environment settings build_tool = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "sbt") -hadoop_version = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "hadoop2.3") --- End diff -- The default value should be 2.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...
Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > For better user experience, we should automatically infer the schema and write it back to metastore, if there is no case-sensitive table schema in metastore. This has the cost of detection the need of schema inference, and complicating table read code path. Totally agree. I think the default behavior should be to infer and backfill a case-sensitive schema into the table properties if one isn't already there. An option should also be provided to disable all inference and just fall back to the case-insensitive metastore schema if none is found (i.e. the current behavior in 2.1.0). > If this is only a compatibility issue, I think it's fair to ask the cluster maintainers to run some commands after upgrade Spark cluster. Even there are a lot of tables, it's easy to write a script to automate it. I don't think this is fair. For one, as I've mentioned, in some cases Spark may not be the tool being used to maintain the metastore. This will now require the warehouse admins to set up a Spark cluster and run these migration commands on every table with case-sensitive underlying data if they'd like them to be accessible from Spark. As a second point, while writing an automation script may be trivial the execution costs aren't, especially if the data is stored in a format like JSON where each and every record in the table must be read in order to infer the schema. > If there is no Spark specific table properties, we assume this table is created by hive(not by external systems like Presto), so the schema of parquet files should be all lowercased. This isn't an assumption made by Spark prior to 2.1.0, whether this was an explicit decision or not. All I'm asking for is a way to configure Spark to continue supporting a use case it has for years. Also, in our case, the table was created by Spark, not Presto. Presto is just an example of another execution engine we've put in front of our warehouse that hasn't had a problem with the underlying Parquet data being case-sensitive. We just used an older version of Spark to create the tables. I would think long and hard about whether requiring warehouse admins to run potentially-costly migrations between Spark versions to update table metadata is a preferable option to offering a way for being backwards-compatible with the old behavior. Again, I think introducing a mechanism to migrate the table properties is a good idea. I just don't think it should be the only option. > Another proposal is to make parquet reader case-insensitive, so that we can solve this problem without schema inference. But the problem is, Spark can be configured to be case-sensitive, so that it's possible to write such a schema (conflicting columns after lower-casing) into metastore. I think this proposal is the best if we can totally make Spark case-insensitive. I don't think this would be a bad option if this could be enabled at the Parquet level, but it seems as their work towards enabling case-insensitive file access has stalled. As @ericl pointed out above, moving this to the ParquetReadSupport level may make the situation better for Parquet but the behavior won't be consistent across file formats like ORC or JSON. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100133896 --- Diff: dev/run-tests-jenkins.py --- @@ -165,12 +165,6 @@ def main(): if "test-maven" in ghprb_pull_title: os.environ["AMPLAB_JENKINS_BUILD_TOOL"] = "maven" # Switch the Hadoop profile based on the PR title: --- End diff -- Yep. This is not related to the SparkPullRequestBuilder failures. SparkPullRequestBuilder is due to the other file in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100133412 --- Diff: dev/run-tests-jenkins.py --- @@ -165,12 +165,6 @@ def main(): if "test-maven" in ghprb_pull_title: os.environ["AMPLAB_JENKINS_BUILD_TOOL"] = "maven" # Switch the Hadoop profile based on the PR title: --- End diff -- Ah... right missed this. This part doesn't actually matter, but, do the Jenkins jobs not all set a Hadoop version? then yeah could default to hadoop2.3 which doesn't exist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16858 **[Test build #72595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/testReport)** for PR 16858 at commit [`b8625a2`](https://github.com/apache/spark/commit/b8625a2b9e95c9c818c8f781d1ed839d24e617eb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16858#discussion_r100131333 --- Diff: dev/run-tests-jenkins.py --- @@ -165,12 +165,6 @@ def main(): if "test-maven" in ghprb_pull_title: os.environ["AMPLAB_JENKINS_BUILD_TOOL"] = "maven" # Switch the Hadoop profile based on the PR title: --- End diff -- This removes hadoop 2.5 and earlier, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16858 @srowen Could you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/16858 [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6 ## What changes were proposed in this pull request? After SPARK-19464, SparkPullRequestBuilder fails because it still tries to use hadoop2.3. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/72592/console ## How was this patch tested? Pass the existing test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark hotfix_run-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16858.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16858 commit b8625a2b9e95c9c818c8f781d1ed839d24e617eb Author: Dongjoon Hyun Date: 2017-02-08T18:07:22Z [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6 instead of hadoop2.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16387 @samkum the code is in this PR! Just revert the last two commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16844 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16844 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72594/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16844 **[Test build #72594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72594/testReport)** for PR 16844 at commit [`d9aa208`](https://github.com/apache/spark/commit/d9aa2081c514577399ba77cfe2145a00ed477ef8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user davies commented on the issue: https://github.com/apache/spark/pull/16844 @viirya Addressed your comment, also fixed another bug (updated PR description). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16844 **[Test build #72594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72594/testReport)** for PR 16844 at commit [`d9aa208`](https://github.com/apache/spark/commit/d9aa2081c514577399ba77cfe2145a00ed477ef8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too
Github user ooq commented on the issue: https://github.com/apache/spark/pull/15193 Thanks for the patch @yaooqinn . As per comments for `getValueRow`, because `getValueRow(id)` is always called after `getKeyRow(id)` with the same id, we use `getValueFromKey(id) to retrieve value row and use the id as a flag for cached/uncached. The patch is unnecessary IMO. Also, with the patch code, it seems value row is not pointing to the correct position? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Looks like Jenkins is failing to build any recent PR due to the following error: ```[error] Could not find hadoop2.3 in the list. Valid options are ['hadoop2.6', 'hadoop2.7']``` I would guess this is related to [this commit](https://github.com/apache/spark/commit/e8d3fca4502d5f5b8f38525b5fdabe80ccf9a8ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100127337 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- > Could we make it more general? For example, using a Map[String, String] Being the person who requested this class instead of an opaque map, I think using an opaque map makes for a really bad user API. The listener now needs to know about "magic keys" that have special meaning, which can vary depending on the destination. So you end up making up some contract that certain keys have some special meanings an all sources need to use them that way, so basically you end up encoding this class in a map. That being said I'm not super happy with the way JDBC works, because there's still some information embedded in the map. I thought about it a little but didn't come up with a good solution; embedding the table name in the JDBC URI sounds hacky and brittle. Best one I got is a separate field in this class (e.g. `serverUri`) that can be used to identify the server that is hosting the `destination` value (not needed for FS-based destinations since it's in the URI, but could be useful in other cases - maybe other table-based systems like Kudu or HBase). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16744 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72593/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16744 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #72593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72593/testReport)** for PR 16744 at commit [`67a4acb`](https://github.com/apache/spark/commit/67a4acb7b253ec558471d86bcbf3a1fa969d2229). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 Thank you for merging, @srowen . Thank you for review, @vanzin , @liancheng , @jaceklaskowski , too! I'm watching the build system on mockito hotfix. I believe it will become stabler eventually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Amending the PR again to fix new dependency conflict in spark/pom.xml. Thanks again for taking the time to review this, @brkyvz and @srowen. Please let me know if you feel any additional changes are needed before this is ready to merge. Since this doesn't break any existing APIs I think it would make some people happy if we could get this in the 2.1.1 release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #72593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72593/testReport)** for PR 16744 at commit [`67a4acb`](https://github.com/apache/spark/commit/67a4acb7b253ec558471d86bcbf3a1fa969d2229). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15193 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add back mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16853 **[Test build #3564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3564/testReport)** for PR 16853 at commit [`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16853 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add back mo...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16853 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72592/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72592/testReport)** for PR 16736 at commit [`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16850 **[Test build #72586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72586/testReport)** for PR 16850 at commit [`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72586/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72592/testReport)** for PR 16736 at commit [`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org