[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213670329 **[Test build #56775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56775/consoleFull)** for PR 12629 at commit [`e5dec86`](https://github.com/apache/spark/commit/e5dec86845cbf25eb606ceea7a81151c0ed638de). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12397#issuecomment-213670252 Can we re test this as I think there was a minor change since the test build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg commented on the pull request: https://github.com/apache/spark/pull/12501#issuecomment-213670198 closing this in favour of other implementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...
Github user robbinspg closed the pull request at: https://github.com/apache/spark/pull/12501 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12319 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12630 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213669972 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12630#issuecomment-213669781 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213669684 **[Test build #56782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56782/consoleFull)** for PR 12568 at commit [`3690c7c`](https://github.com/apache/spark/commit/3690c7cc210dc9aedd168202dff17902f4c0c4e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12626 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...
Github user pravingadakh commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-213668195 @dbtsai I'll update the PR soon, I have been overwhelmed by office work :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12626#issuecomment-213668471 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12626#issuecomment-213667937 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11102] [SQL] Uninformative exception wh...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/9490#issuecomment-213668158 ping @zjffdu --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14857] [SQL] Table/Database Name Valida...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12618#issuecomment-213667639 **[Test build #56781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56781/consoleFull)** for PR 12618 at commit [`bfe536e`](https://github.com/apache/spark/commit/bfe536eaba938be18253aeb71eb79a56f69856ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12619 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213667183 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213666884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56774/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213666883 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213666846 **[Test build #56774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56774/consoleFull)** for PR 12619 at commit [`6056a47`](https://github.com/apache/spark/commit/6056a47cf807cbf70f7f26af7dcc07737dc232c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12630#issuecomment-213666752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56771/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12630#issuecomment-213666751 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12630#issuecomment-213666711 **[Test build #56771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56771/consoleFull)** for PR 12630 at commit [`06ff604`](https://github.com/apache/spark/commit/06ff604d90fcd1fc6477dbb6533c4652ec9f12a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14867][BUILD] Make `build/mvn` to use t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12631#issuecomment-21364 **[Test build #56780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56780/consoleFull)** for PR 12631 at commit [`56355fa`](https://github.com/apache/spark/commit/56355fab23ef59b79bbaafa71643ca742326). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14867][BUILD] Make `build/mvn` to use t...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12631 [SPARK-14867][BUILD] Make `build/mvn` to use the downloaded maven if it exists. ## What changes were proposed in this pull request? Currently, `build/mvn` provides a convenient option, `--force`, in order to use the recommended version of maven without changing PATH environment variable. However, there were two problems. - `dev/lint-java` does not use the newly installed maven. ```bash $ ./build/mvn --force clean $ ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn ``` - It's not easy to type `--force` option always. If '--force' option is used once, we had better prefer the installed maven recommended by Spark. This PR makes `build/mvn` check the existence of maven installed by `--force` option first. ## How was this patch tested? Manual. ```bash $ ./build/mvn --force clean $ ./dev/lint-java Using `mvn` from path: /Users/dongjoon/spark/build/apache-maven-3.3.9/bin/mvn ... $ rm -rf ./build/apache-maven-3.3.9/ $ ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-14867 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12631 commit 56355fab23ef59b79bbaafa71643ca742326 Author: Dongjoon HyunDate: 2016-04-14T08:55:46Z [SPARK-14867][BUILD] Make `build/mvn` to use the downloaded maven if it exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-213666548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56772/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-213666547 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-213666496 **[Test build #56772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56772/consoleFull)** for PR 12590 at commit [`bda4ae6`](https://github.com/apache/spark/commit/bda4ae62c812b256da4bb7f89f07623dd87ea439). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60823427 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -104,6 +105,25 @@ class GaussianMixtureModel private[ml] ( @Since("2.0.0") def gaussians: Array[MultivariateGaussian] = parentModel.gaussians + /** + * Helper method used in Python. + * Retrieve Gaussian distributions as a DataFrame. + * Each row represents a Gaussian Distribution. + * Two columns are defined: mean and cov. + * Schema: + * root --- End diff -- Surround schema with triple braces to make it appear like code: ``` {{{ root |-- ... }}} ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60823428 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -104,6 +105,25 @@ class GaussianMixtureModel private[ml] ( @Since("2.0.0") def gaussians: Array[MultivariateGaussian] = parentModel.gaussians + /** + * Helper method used in Python. + * Retrieve Gaussian distributions as a DataFrame. + * Each row represents a Gaussian Distribution. + * Two columns are defined: mean and cov. + * Schema: + * root + * |-- mean: vector (nullable = true) + * |-- cov: matrix (nullable = true) + */ + def gaussiansDF: DataFrame = { --- End diff -- Since 2.0.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60823426 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -104,6 +105,25 @@ class GaussianMixtureModel private[ml] ( @Since("2.0.0") def gaussians: Array[MultivariateGaussian] = parentModel.gaussians + /** + * Helper method used in Python. --- End diff -- Remove this 1 line. (This is an implementation detail and should not be exposed in user docs.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11919#issuecomment-213664760 **[Test build #56779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56779/consoleFull)** for PR 11919 at commit [`dd50130`](https://github.com/apache/spark/commit/dd5013002611d3c232b8384eef89f13f9113eef4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213663864 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56767/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213663862 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213663616 **[Test build #56767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56767/consoleFull)** for PR 12568 at commit [`4acfb8c`](https://github.com/apache/spark/commit/4acfb8c4eb24f3a6fdee67252d495c44fe44b2b9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][ML][MLLIB] Remove unused imports
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12497#issuecomment-213663512 **[Test build #56778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56778/consoleFull)** for PR 12497 at commit [`ab42268`](https://github.com/apache/spark/commit/ab42268106dfde1b3de156f47f4cebfcca50129e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213663454 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][ML][MLLIB] Remove unused imports
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/12497#issuecomment-213663449 @srowen I have reviewed all scala files in Graphx and some in SQL. And remove another some unused imports in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213663455 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56766/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213663432 **[Test build #56766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56766/consoleFull)** for PR 12568 at commit [`04cc43b`](https://github.com/apache/spark/commit/04cc43b29cbf7e4b71046b848e446143b0b212a1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12124#issuecomment-213663303 I'm working on a simpler fix for now: [https://issues.apache.org/jira/browse/SPARK-14862] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][ML][MLLIB] Remove unused imports
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12497#issuecomment-213663339 **[Test build #56777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56777/consoleFull)** for PR 12497 at commit [`90e57c8`](https://github.com/apache/spark/commit/90e57c8bc98abc36e0c3a26f348da64b358acd3d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-213663172 **[Test build #56776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56776/consoleFull)** for PR 12239 at commit [`82add06`](https://github.com/apache/spark/commit/82add06177c6b730459aea5eb7e277a0615147fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-213663159 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-213663160 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56768/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-213663125 **[Test build #56768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56768/consoleFull)** for PR 12268 at commit [`92f8f38`](https://github.com/apache/spark/commit/92f8f387cec10cb61e178b312748f86bd75b1b55). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-213663088 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14838][SQL] Implement statistics in Ser...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12599#discussion_r60823141 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -83,6 +83,28 @@ case class SerializeFromObject( child: LogicalPlan) extends UnaryNode with ObjectConsumer { override def output: Seq[Attribute] = serializer.map(_.toAttribute) + + // We can't estimate the size of ObjectType. We implement statistics here to avoid + // directly estimate any child plan which produces domain objects as output. + override def statistics: Statistics = { +if (child.output.head.dataType.isInstanceOf[ObjectType]) { + val underlyingPlan = child.find { p => --- End diff -- +1 for the 4k default size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user JustinPihony commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213662908 @HyukjinKwon I just posted on the JIRA the background of `Properties` and how reasonable it is to assume it can be converted to a `String`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213662852 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213662854 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56770/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213662817 **[Test build #56770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56770/consoleFull)** for PR 12319 at commit [`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213662708 **[Test build #2859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2859/consoleFull)** for PR 12319 at commit [`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213662696 **[Test build #56775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56775/consoleFull)** for PR 12629 at commit [`e5dec86`](https://github.com/apache/spark/commit/e5dec86845cbf25eb606ceea7a81151c0ed638de). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11919#discussion_r60822985 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -1306,4 +1306,33 @@ object ALS extends DefaultParamsReadable[ALS] with Logging { * satisfies this requirement, we simply use a type alias here. */ private[recommendation] type ALSPartitioner = org.apache.spark.HashPartitioner + + /** + * Private function to checkpoint the RDD and clean up its all of its parents' shuffles eagerly. + */ + private[spark] def checkpointAndCleanParents[T](rdd: RDD[T], blocking: Boolean = false): Unit = { +val sc = rdd.sparkContext +// If there is no reference tracking we skip clean up. +if (sc.cleaner.isEmpty) { + return rdd.checkpoint() --- End diff -- Ah thats a good catch (this used to not be an issue since I left the materilization in the initial PR for both). Anyways I'll refactor this to break up the cleanup and explicitly capture the deps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8398][CORE] Hadoop input/output format ...
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/6848#issuecomment-213661475 @holdenk ok i tried to make it look all pretty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213661444 **[Test build #56774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56774/consoleFull)** for PR 12619 at commit [`6056a47`](https://github.com/apache/spark/commit/6056a47cf807cbf70f7f26af7dcc07737dc232c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14654][CORE][WIP] New accumulator API
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12612#discussion_r60822740 --- Diff: core/src/main/scala/org/apache/spark/NewAccumulator.scala --- @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.{lang => jl} +import java.io.{ObjectInputStream, ObjectOutputStream} +import java.util.concurrent.atomic.AtomicLong +import javax.annotation.concurrent.GuardedBy + +import org.apache.spark.scheduler.AccumulableInfo +import org.apache.spark.util.Utils + + +private[spark] case class AccumulatorMetadata( +id: Long, +name: Option[String], +countFailedValues: Boolean) extends Serializable + + +abstract class NewAccumulator[IN, OUT] extends Serializable { + private[spark] var metadata: AccumulatorMetadata = _ + + private[spark] def register( + sc: SparkContext, + id: Long = AccumulatorContext.newId(), + name: Option[String] = None, + countFailedValues: Boolean = false): Unit = { +if (this.metadata != null) { + throw new IllegalStateException("Cannot register an Accumulator twice.") +} +this.metadata = AccumulatorMetadata(id, name, countFailedValues) +AccumulatorContext.register(this) +sc.cleaner.foreach(_.registerAccumulatorForCleanup(this)) + } + + private[spark] def assertRegistered(): Unit = { +if (metadata == null) { + throw new IllegalStateException("Accumulator is not registered yet") +} + } + + def id: Long = { +assertRegistered() +metadata.id + } + + def initialize(): Unit = {} + + def add(v: IN): Unit + + def +=(v: IN): Unit = add(v) + + def merge(other: NewAccumulator[IN, OUT]): Unit + + def ++=(other: NewAccumulator[IN, OUT]): Unit = merge(other) + + def value: OUT + + private[spark] def toInfo(update: Option[Any], value: Option[Any]): AccumulableInfo = { +assertRegistered() +val isInternal = metadata.name.exists(_.startsWith(InternalAccumulator.METRICS_PREFIX)) +new AccumulableInfo( + metadata.id, metadata.name, update, value, isInternal, metadata.countFailedValues) + } + + // Called by Java when serializing an object + private def writeObject(out: ObjectOutputStream): Unit = Utils.tryOrIOException { +assertRegistered() +out.defaultWriteObject() + } + + // Called by Java when deserializing an object + private def readObject(in: ObjectInputStream): Unit = Utils.tryOrIOException { +in.defaultReadObject() +initialize() + +// Automatically register the accumulator when it is deserialized with the task closure. +// This is for external accumulators and internal ones that do not represent task level +// metrics, e.g. internal SQL metrics, which are per-operator. +val taskContext = TaskContext.get() +if (taskContext != null) { + taskContext.registerAccumulator(this) +} + } +} + +object AccumulatorContext { + + /** + * This global map holds the original accumulator objects that are created on the driver. + * It keeps weak references to these objects so that accumulators can be garbage-collected + * once the RDDs and user-code that reference them are cleaned up. + * TODO: Don't use a global map; these should be tied to a SparkContext (SPARK-13051). + */ + @GuardedBy("AccumulatorContext") + private val originals = new java.util.HashMap[Long, jl.ref.WeakReference[NewAccumulator[_, _]]] + + private[this] val nextId = new AtomicLong(0L) + + /** + * Return a globally unique ID for a new [[NewAccumulator]]. + * Note: Once you copy the [[NewAccumulator]] the ID is no longer unique. + */ + def newId(): Long = nextId.getAndIncrement + + /** + * Register an [[NewAccumulator]]
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213661096 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213661097 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56765/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213661066 **[Test build #56765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56765/consoleFull)** for PR 12629 at commit [`ccd3c7b`](https://github.com/apache/spark/commit/ccd3c7b43c0247e345b714210f7421d7dc484718). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660941 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56773/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660938 **[Test build #56773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56773/consoleFull)** for PR 12619 at commit [`871c009`](https://github.com/apache/spark/commit/871c00960dce2b1bf598e781dbd1ba3f18dddf3f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660940 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/12619#discussion_r60822682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -308,8 +304,11 @@ private[sql] class DefaultSource // TODO: if you move this into the closure it reverts to the default values. // If true, enable using the custom RecordReader for parquet. This only works for // a subset of the types (no complex types). -val enableVectorizedParquetReader: Boolean = sqlContext.conf.parquetVectorizedReaderEnabled && - dataSchema.forall(_.dataType.isInstanceOf[AtomicType]) +val resultSchema = StructType(partitionSchema.fields ++ requiredSchema.fields) +val enableVectorizedReader: Boolean = sqlContext.conf.parquetVectorizedReaderEnabled && --- End diff -- Nothing too important. The comment `// If true, enable using the custom RecordReader for parquet...` could be above this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660896 **[Test build #56773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56773/consoleFull)** for PR 12619 at commit [`871c009`](https://github.com/apache/spark/commit/871c00960dce2b1bf598e781dbd1ba3f18dddf3f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-213660696 **[Test build #56772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56772/consoleFull)** for PR 12590 at commit [`bda4ae6`](https://github.com/apache/spark/commit/bda4ae62c812b256da4bb7f89f07623dd87ea439). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12630#issuecomment-213660698 **[Test build #56771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56771/consoleFull)** for PR 12630 at commit [`06ff604`](https://github.com/apache/spark/commit/06ff604d90fcd1fc6477dbb6533c4652ec9f12a8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-213660622 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/12630 [SPARK-14866][SQL] Break SQLQuerySuite out into smaller test suites ## What changes were proposed in this pull request? This patch breaks SQLQuerySuite out into smaller test suites. It was a little bit too large for debugging. ## How was this patch tested? This is a test only change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-14866 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12630.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12630 commit 06ff604d90fcd1fc6477dbb6533c4652ec9f12a8 Author: Reynold XinDate: 2016-04-23T03:42:16Z [SPARK-14866][SQL] Break SQLQuerySuite out into smaller test suites --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12615#issuecomment-213660185 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12615#issuecomment-213660186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56764/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12615#issuecomment-213660152 **[Test build #56764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56764/consoleFull)** for PR 12615 at commit [`957c3c1`](https://github.com/apache/spark/commit/957c3c130aeeb31445027168add0f6a99acd3fe8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12615 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660054 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56769/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660053 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213660046 **[Test build #56769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56769/consoleFull)** for PR 12619 at commit [`04a900b`](https://github.com/apache/spark/commit/04a900b10a04d3930f0dbb7ad3d570552de49075). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9314] [EC2] add root EBS config options...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/7647#issuecomment-213659877 ping @kmaehashi --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12615#issuecomment-213659905 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213659831 **[Test build #2859 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2859/consoleFull)** for PR 12319 at commit [`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213659743 **[Test build #56770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56770/consoleFull)** for PR 12319 at commit [`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603] [sparkR] In windows, Incorrect fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/7025#issuecomment-213659754 ping @prakashpc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603] [sparkR] In windows, Incorrect fi...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/7025#issuecomment-213659474 @JoshRosen I can submit a PR based on this if you think this PR is abandoned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213659267 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213658964 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/12319#issuecomment-213658842 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12619#issuecomment-213658753 **[Test build #56769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56769/consoleFull)** for PR 12619 at commit [`04a900b`](https://github.com/apache/spark/commit/04a900b10a04d3930f0dbb7ad3d570552de49075). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-213658568 **[Test build #56768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56768/consoleFull)** for PR 12268 at commit [`92f8f38`](https://github.com/apache/spark/commit/92f8f387cec10cb61e178b312748f86bd75b1b55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12619#discussion_r60822250 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -308,8 +304,11 @@ private[sql] class DefaultSource // TODO: if you move this into the closure it reverts to the default values. // If true, enable using the custom RecordReader for parquet. This only works for // a subset of the types (no complex types). -val enableVectorizedParquetReader: Boolean = sqlContext.conf.parquetVectorizedReaderEnabled && - dataSchema.forall(_.dataType.isInstanceOf[AtomicType]) +val resultSchema = StructType(partitionSchema.fields ++ requiredSchema.fields) +val enableVectorizedReader: Boolean = sqlContext.conf.parquetVectorizedReaderEnabled && --- End diff -- move to where? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14828][SQL] Start SparkSession in REPL ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12589#discussion_r60822234 --- Diff: repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -1026,21 +1025,7 @@ class SparkILoop( } @DeveloperApi - def createSQLContext(): SQLContext = { -val name = "org.apache.spark.sql.hive.HiveContext" -val loader = Utils.getContextOrSparkClassLoader -try { - sqlContext = loader.loadClass(name).getConstructor(classOf[SparkContext]) -.newInstance(sparkContext).asInstanceOf[SQLContext] - logInfo("Created sql context (with Hive support)..") -} -catch { - case _: java.lang.ClassNotFoundException | _: java.lang.NoClassDefFoundError => -sqlContext = new SQLContext(sparkContext) -logInfo("Created sql context..") -} -sqlContext - } + def createSparkSession(): SparkSession = Main.createSparkSession() --- End diff -- Not very sure. How about we still duplicate the code for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-213658002 @rxin Could you please review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213657538 **[Test build #56767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56767/consoleFull)** for PR 12568 at commit [`4acfb8c`](https://github.com/apache/spark/commit/4acfb8c4eb24f3a6fdee67252d495c44fe44b2b9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213656755 **[Test build #56766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56766/consoleFull)** for PR 12568 at commit [`04cc43b`](https://github.com/apache/spark/commit/04cc43b29cbf7e4b71046b848e446143b0b212a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213656756 **[Test build #56765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56765/consoleFull)** for PR 12629 at commit [`ccd3c7b`](https://github.com/apache/spark/commit/ccd3c7b43c0247e345b714210f7421d7dc484718). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-213656738 cc @davies @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12629 [SPARK-14800][SQL] Dealing with null as a value in options for each internal data source ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14800 This PR add the support for `null` for values as options (as a default value) for all the internal data source in Spark. This PR introduces two classes - `PrameterUtils`: This has some functions used in `CSVOptions` to check `null` for other data sources. - `OrcOptions`: Just like `ParquetOptions` this was separated (actually they are almost identical). ## How was this patch tested? Unit tests in `CSVSuite`, `JsonSuite`, `OrcHadoopFsRelationSuite, `ParquetHadoopFsRelationSuite` and `LibSVMRelation`. Also,`sbt scalastyle` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-14800 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12629 commit 8fb3a23ef61353749c35f523dbfc7d8f5d739fbf Author: hyukjinkwonDate: 2016-04-23T02:09:36Z CSV and JSON are now safe with null options commit ccd3c7b43c0247e345b714210f7421d7dc484718 Author: hyukjinkwon Date: 2016-04-23T02:55:02Z text, ORC, Parquet and libsvm are also okay --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/12568#discussion_r60822093 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -182,7 +182,7 @@ private[spark] class BlockManager( val shuffleConfig = new ExecutorShuffleInfo( diskBlockManager.localDirs.map(_.toString), diskBlockManager.subDirsPerLocalDir, - shuffleManager.shortName) --- End diff -- Yes, I agree with @markgrover . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/12568#issuecomment-213656641 @vanzin I have addressed your comments. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/12568#discussion_r60821688 --- Diff: common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleIntegrationSuite.java --- @@ -184,12 +184,9 @@ public void testFetchThreeSort() throws Exception { exec0Fetch.releaseBuffers(); } - @Test - public void testFetchInvalidShuffle() throws Exception { + @Test (expected = RuntimeException.class) --- End diff -- It will throw a generic RunTimeException. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12626#issuecomment-213653604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56761/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org