[GitHub] spark issue #15413: [SPARK-17847] [ML] Copy GaussianMixture implementation f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15413 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66625/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15413: [SPARK-17847] [ML] Copy GaussianMixture implementation f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15413 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15413: [SPARK-17847] [ML] Copy GaussianMixture implementation f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15413 **[Test build #66625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66625/consoleFull)** for PR 15413 at commit [`5a8de4a`](https://github.com/apache/spark/commit/5a8de4a7289700d20e240dcf82b61552c213dcf8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15414 **[Test build #66629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66629/consoleFull)** for PR 15414 at commit [`5cb06fc`](https://github.com/apache/spark/commit/5cb06fcd7987d1889b42a47f38bff89a47161123). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/15414 [SPARK-17848][ML] Move LabelCol datatype cast into Predictor.fit ## What changes were proposed in this pull request? 1, move cast to `Predictor` 2, and then, remove unnecessary cast ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark move_cast Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15414.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15414 commit 5cb06fcd7987d1889b42a47f38bff89a47161123 Author: Zheng RuiFengDate: 2016-10-10T05:44:47Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15292 **[Test build #66628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66628/consoleFull)** for PR 15292 at commit [`350f55d`](https://github.com/apache/spark/commit/350f55da303c6ccc876a4f6d5a1e455dd3337343). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15292 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysi...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/15316#discussion_r82545437 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala --- @@ -43,6 +43,11 @@ class AnalysisException protected[sql] ( } override def getMessage: String = { +val planAnnotation = plan.map(p => s";\n$p").getOrElse("") --- End diff -- Why do we need a separate method here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15371 **[Test build #66627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66627/consoleFull)** for PR 15371 at commit [`da2311a`](https://github.com/apache/spark/commit/da2311a5a0cee356169bed1a940bcb5bb0c87b26). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...
Github user seyfe commented on the issue: https://github.com/apache/spark/pull/15371 @zsxwing , I think that is a good idea. I search it and that is the only place we use `BlockStatusesAccumulator`. Let me remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14136 We definitely need this as a native implementation. One thing we should think about is memory management. collect_list, collect_set, and percentile are examples of functions that are very memory heavy and can easily OOM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14788: [SPARK-17174][SQL] Add the support for TimestampT...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14788#discussion_r82544981 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2548,16 +2548,20 @@ object functions { def to_date(e: Column): Column = withExpr { ToDate(e.expr) } /** - * Returns date truncated to the unit specified by the format. + * Returns timestamp truncated to the unit specified by the format. --- End diff -- doesn't this actually change the data type returned? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14788: [SPARK-17174][SQL] Add the support for TimestampT...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14788#discussion_r82544965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2374,14 +2374,14 @@ object functions { * @group datetime_funcs * @since 1.5.0 */ - def date_add(start: Column, days: Int): Column = withExpr { DateAdd(start.expr, Literal(days)) } + def date_add(start: Column, days: Int): Column = withExpr { AddDays(start.expr, Literal(days)) } --- End diff -- why change the name of these expressions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14788 Actually can we avoid renaming these expressions? I don't see the point to rename DateSub to SubDays. It just makes it more annoying to link the user facing API with the internal expressions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15292 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66622/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15292 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15292 **[Test build #66622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66622/consoleFull)** for PR 15292 at commit [`350f55d`](https://github.com/apache/spark/commit/350f55da303c6ccc876a4f6d5a1e455dd3337343). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14897 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #66626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66626/consoleFull)** for PR 14788 at commit [`8c50b2c`](https://github.com/apache/spark/commit/8c50b2cecc8c69bac19206969bf33133779c6337). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14897 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66620/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14897 **[Test build #66620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66620/consoleFull)** for PR 14897 at commit [`29e292a`](https://github.com/apache/spark/commit/29e292a954f1b07d80d03d0fd6c4ad4605b41ab7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15371 @seyfe I think we can remove `BlockStatusesAccumulator` and just use `private val _updatedBlockStatuses = new CollectionAccumulator[(BlockId, BlockStatus)]` instead. `BlockStatusesAccumulator` doesn't provide more functions than `CollectionAccumulator`. Sorry that I didn't find that early. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14788 LGTM - I'll merge as soon as tests complete successfully --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14788 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15371: [SPARK-17816] [Core] Fix ConcurrentModificationEx...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15371#discussion_r82544528 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -444,7 +444,9 @@ class CollectionAccumulator[T] extends AccumulatorV2[T, java.util.List[T]] { override def copy(): CollectionAccumulator[T] = { val newAcc = new CollectionAccumulator[T] -newAcc._list.addAll(_list) +_list.synchronized { --- End diff -- Good catch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 @sethah OK, I will open a new JIRA about labelCol. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15413: [SPARK-17847] [ML] Copy GaussianMixture implementation f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15413 **[Test build #66625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66625/consoleFull)** for PR 15413 at commit [`5a8de4a`](https://github.com/apache/spark/commit/5a8de4a7289700d20e240dcf82b61552c213dcf8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15413: [SPARK-17847] [ML] Copy GaussianMixture implement...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15413 [SPARK-17847] [ML] Copy GaussianMixture implementation from mllib to ml ## What changes were proposed in this pull request? Copy ```GaussianMixture``` implementation from mllib to ml, then we can add new features to it. I left mllib ```GaussianMixture``` untouched, unlike some other algorithms to wrap the ml implementation. For the following reasons: * mllib ```GaussianMixture``` allow k == 1, but ml does not. * mllib ```GaussianMixture``` supports setting initial model, but ml does not support currently. (We will definitely add this feature for ml in the future) Meanwhile, we did some improvements to handle sparse data more efficiently. I use ```ml.linalg``` as the underlying data structure rather than the old breeze dense vector. Todo: - [ ] Performance test. ## How was this patch tested? Existing tests and added new tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-17847 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15413.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15413 commit 5a8de4a7289700d20e240dcf82b61552c213dcf8 Author: Yanbo LiangDate: 2016-10-10T05:00:53Z Copy GaussianMixture implementation from mllib to ml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15346: [SPARK-17741][SQL] Grammar to parse top level and...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15346 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15346: [SPARK-17741][SQL] Grammar to parse top level and nested...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15346 LGTM - merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15389#discussion_r82544055 --- Diff: python/pyspark/rdd.py --- @@ -2029,7 +2028,11 @@ def coalesce(self, numPartitions, shuffle=False): >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect() [[1, 2, 3, 4, 5]] """ -jrdd = self._jrdd.coalesce(numPartitions, shuffle) +if shuffle: +data_java_rdd = self._to_java_object_rdd().coalesce(numPartitions, shuffle) --- End diff -- would be great to add some inline comment explaining why this is necessary. otherwise somebody can just come in 6 month from now and change this back to `jrdd = self._jrdd.coalesce(numPartitions, shuffle)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15403: [SPARK-17832][SQL] TableIdentifier.quotedString c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15403 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15403: [SPARK-17832][SQL] TableIdentifier.quotedString creates ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15403 LGTM - merging to master/2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12775 **[Test build #66624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66624/consoleFull)** for PR 12775 at commit [`699730b`](https://github.com/apache/spark/commit/699730b592e8d913e728e0097e140c710c201dce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Thanks for the review. Updated the patch to address the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14426 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66619/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14426 **[Test build #66619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66619/consoleFull)** for PR 14426 at commit [`57adfd3`](https://github.com/apache/spark/commit/57adfd33b84bee03c9f0a302d9981f226437c2e3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Hint(name: String, parameters: Seq[String], child: LogicalPlan) extends UnaryNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15397 > How is this going to work with assign? It seems like it's just avoiding the problem, not fixing it. We can seek to the offsets provided by the user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15387 > During the original implementation I had verified that calling pause kills the internal message buffer, which is one of the complications leading to a cached consumer per partition. I observed the same behavior during my debug. I found that the first `poll(0)` will always send a request to prefetch the data. Pausing partitions just prevents the second `poll(0)` from returning anything at here: https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L527 > You dont want poll consuming messages, its not just about offset correctness, the driver shouldnt be spending time or bandwidth doing that. I think you have agreed that this is impossible via current KafkaConsumer APIs as well. However, the unknown thing to me is that if the first `poll(0)` will return something. I saw the first `poll(0)` will always send the fetching request, but I'm not sure that if it's possible that the response will be processed in the first `poll(0)`. If this could happen, pausing partitions will not help in such case since it's called after the first `poll(0)`. In addition, since it's unclear in javadoc, it could be changed in the future. That's why I decided to manually seek to the beginning in #15397. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66618/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15376 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15376 **[Test build #66618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66618/consoleFull)** for PR 15376 at commit [`f328f3a`](https://github.com/apache/spark/commit/f328f3a2c0936555226a7c381625d3d5b8127302). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15412 **[Test build #66623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66623/consoleFull)** for PR 15412 at commit [`e141868`](https://github.com/apache/spark/commit/e14186836a6aecbc58839edffa57213869271b91). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15412 Sure I can fix those in this pull request too. Thanks for the reminder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15412 Hi @rxin , I just happened to look at this PR. I just want to leave a gentle reminder just in case, that there are [SPARK-17656](https://issues.apache.org/jira/browse/SPARK-17656) and two more cases in `./sql/core/src/main/scala/org/apache/spark/sql/expressions/udaf.scala` (Maybe this is not directly relevant with this PR but just when I saw the changes here, it rang a bell to me and I just wanted to let you know just in case). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15399 First, I am not familiar with the code in this component. Thus, I am not the right person to review it. Second, when I going over the pending JIRA list, I found many bugs that are reported in Thrift Server. Third, I do not know what is the current strategy for supporting `beeline` and `spark-sql`. If this is the focus, I expect at least 50+ bug fixes in this area --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of Vertex...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15396 Can you reference the earlier pull requests here too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15314 Maybe we can solve the label column issue first? Would you mind opening a new Jira/PR? I'm happy to hear other opinions as well :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15292 **[Test build #66622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66622/consoleFull)** for PR 15292 at commit [`350f55d`](https://github.com/apache/spark/commit/350f55da303c6ccc876a4f6d5a1e455dd3337343). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] UnsafeSorterSpillReader should use N...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #66621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66621/consoleFull)** for PR 15408 at commit [`856593a`](https://github.com/apache/spark/commit/856593ac4d54c4981f79b7a4b09c94cc66b5c63b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] UnsafeSorterSpillReader should use N...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/15408 >> Can you also expand on what the 7% means? Is it some workload end-to-end that's been improved by 7%, or the sorting itself improves by 7%? The perf improvement was end-to-end which means the sorting improvement is definitely more than that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] UnsafeSorterSpillReader shoul...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82541750 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,91 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + */ +public final class NioBasedBufferedFileInputStream extends InputStream { --- End diff -- Added a test suite for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] UnsafeSorterSpillReader shoul...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82541747 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,91 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + */ +public final class NioBasedBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE = 8192; + + private final ByteBuffer bb; + + private final FileChannel ch; + + public NioBasedBufferedFileInputStream(File file, int bufferSize) throws IOException { +bb = ByteBuffer.allocateDirect(bufferSize); +ch = FileChannel.open(file.toPath(), StandardOpenOption.READ); +ch.read(bb); +bb.flip(); --- End diff -- removed, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 @sethah ok, I will revert this PR to only focus on: 1, add test for WeightCol in MLTestingUtils.checkNumericTypes; 2, add cast for WeightCol in each algo; 3, add cast in `getNumClasses` to avoid test failure. What about this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15292#discussion_r82541651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -17,47 +17,132 @@ package org.apache.spark.sql.execution.datasources.jdbc +import java.sql.{Connection, DriverManager} +import java.util.Properties + /** * Options for the JDBC data source. */ class JDBCOptions( @transient private val parameters: Map[String, String]) extends Serializable { + import JDBCOptions._ + + def this(url: String, table: String, parameters: Map[String, String]) = { +this(parameters ++ Map( + JDBCOptions.JDBC_URL -> url, + JDBCOptions.JDBC_TABLE_NAME -> table)) + } + + val asConnectionProperties: Properties = { +val properties = new Properties() +// We should avoid to pass the options into properties. See SPARK-17776. +parameters.filterKeys(!jdbcOptionNames.contains(_)) + .foreach { case (k, v) => properties.setProperty(k, v) } +properties + } + // // Required parameters // - require(parameters.isDefinedAt("url"), "Option 'url' is required.") - require(parameters.isDefinedAt("dbtable"), "Option 'dbtable' is required.") + require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") + require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") // a JDBC URL - val url = parameters("url") + val url = parameters(JDBC_URL) // name of table - val table = parameters("dbtable") + val table = parameters(JDBC_TABLE_NAME) + + // + // Optional parameters + // + val driverClass = { +val userSpecifiedDriverClass = parameters.get(JDBC_DRIVER_CLASS) +userSpecifiedDriverClass.foreach(DriverRegistry.register) + +// Performing this part of the logic on the driver guards against the corner-case where the +// driver returned for a URL is different on the driver and executors due to classpath +// differences. +userSpecifiedDriverClass.getOrElse { + DriverManager.getDriver(url).getClass.getCanonicalName +} + } // - // Optional parameter list + // Optional parameters only for reading // // the column used to partition - val partitionColumn = parameters.getOrElse("partitionColumn", null) + val partitionColumn = parameters.getOrElse(JDBC_PARTITION_COLUMN, null) // the lower bound of partition column - val lowerBound = parameters.getOrElse("lowerBound", null) + val lowerBound = parameters.getOrElse(JDBC_LOWER_BOUND, null) // the upper bound of the partition column - val upperBound = parameters.getOrElse("upperBound", null) + val upperBound = parameters.getOrElse(JDBC_UPPER_BOUND, null) // the number of partitions - val numPartitions = parameters.getOrElse("numPartitions", null) - + val numPartitions = parameters.getOrElse(JDBC_NUM_PARTITIONS, null) require(partitionColumn == null || (lowerBound != null && upperBound != null && numPartitions != null), -"If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," + - " and 'numPartitions' are required.") +s"If '$JDBC_PARTITION_COLUMN' is specified then '$JDBC_LOWER_BOUND', '$JDBC_UPPER_BOUND'," + + s" and '$JDBC_NUM_PARTITIONS' are required.") + val fetchSize = { +val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt +require(size >= 0, + s"Invalid value `${size.toString}` for parameter " + +s"`$JDBC_BATCH_FETCH_SIZE`. The minimum value is 0. When the value is 0, " + +"the JDBC driver ignores the value and does the estimates.") +size + } // - // The options for DataFrameWriter + // Optional parameters only for writing // // if to truncate the table from the JDBC database - val isTruncate = parameters.getOrElse("truncate", "false").toBoolean + val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean // the create table option , which can be table_options or partition_options. // E.g., "CREATE TABLE t (name string)
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15412 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15412 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66616/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15412 **[Test build #66616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66616/consoleFull)** for PR 15412 at commit [`4d02864`](https://github.com/apache/spark/commit/4d02864d2b023bec501578de86b68478feae05c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15314 I strongly prefer to move the issue with the label column into its own Jira/PR. They are different changes and I think the label column issues are large enough to warrant their own considerations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15412 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66615/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15412 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15412 **[Test build #66615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66615/consoleFull)** for PR 15412 at commit [`98b77a7`](https://github.com/apache/spark/commit/98b77a7c660e0064353b1fa98e2e47bc2d971bea). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15388 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15388 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66614/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15388 **[Test build #66614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66614/consoleFull)** for PR 15388 at commit [`25f5d4d`](https://github.com/apache/spark/commit/25f5d4d068509d93630d56db2155f11cc2a9b301). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15399 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66617/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15399 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15399 **[Test build #66617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66617/consoleFull)** for PR 15399 at commit [`d027421`](https://github.com/apache/spark/commit/d027421d0396f971976b18ef2d44ddda97dd5810). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14897 **[Test build #66620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66620/consoleFull)** for PR 14897 at commit [`29e292a`](https://github.com/apache/spark/commit/29e292a954f1b07d80d03d0fd6c4ad4605b41ab7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82539368 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.util.Random + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.linalg.{Vector, VectorUDT} +import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators} +import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} +import org.apache.spark.ml.util.SchemaUtils +import org.apache.spark.sql._ +import org.apache.spark.sql.expressions.UserDefinedFunction +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ + +/** + * Params for [[LSH]]. + */ +@Experimental +@Since("2.1.0") +private[ml] trait LSHParams extends HasInputCol with HasOutputCol { + /** + * Param for the dimension of LSH OR-amplification. + * + * In this implementation, we use LSH OR-amplification to reduce the false negative rate. The + * higher the dimension is, the lower the false negative rate. + * @group param + */ + @Since("2.1.0") + final val outputDim: IntParam = new IntParam(this, "outputDim", "output dimension, where" + +"increasing dimensionality lowers the false negative rate, and decreasing dimensionality" + --- End diff -- No. Since we are implementing OR-amplification, increasing dimensionality lower the false negative rate. In AND-amplification, increasing dimensionality will lower the false positive rate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14897 LGTM. Let's make a small change according to https://github.com/apache/spark/pull/14897#discussion_r82536096 and we can merge this pr. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15399 **[Test build #66617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66617/consoleFull)** for PR 15399 at commit [`d027421`](https://github.com/apache/spark/commit/d027421d0396f971976b18ef2d44ddda97dd5810). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15376 **[Test build #66618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66618/consoleFull)** for PR 15376 at commit [`f328f3a`](https://github.com/apache/spark/commit/f328f3a2c0936555226a7c381625d3d5b8127302). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14426 **[Test build #66619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66619/consoleFull)** for PR 14426 at commit [`57adfd3`](https://github.com/apache/spark/commit/57adfd33b84bee03c9f0a302d9981f226437c2e3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should ha...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/14527 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14897: [SPARK-17338][SQL] add global temp view
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14897#discussion_r82538273 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala --- @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row} +import org.apache.spark.sql.catalog.Table +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class GlobalTempViewSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + override protected def beforeAll(): Unit = { +super.beforeAll() +globalTempDB = spark.sharedState.globalTempViewManager.database + } + + private var globalTempDB: String = _ + + test("basic semantic") { +sql("CREATE GLOBAL TEMP VIEW src AS SELECT 1, 'a'") + +// If there is no database in table name, we should try local temp view first, if not found, +// try table/view in current database, which is "default" in this case. So we expect +// NoSuchTableException here. +intercept[NoSuchTableException](spark.table("src")) + +// Use qualified name to refer to the global temp view explicitly. +checkAnswer(spark.table(s"$globalTempDB.src"), Row(1, "a")) + +// Table name without database will never refer to a global temp view. +intercept[NoSuchTableException](sql("DROP VIEW src")) + +sql(s"DROP VIEW $globalTempDB.src") +// The global temp view should be dropped successfully. +intercept[NoSuchTableException](spark.table(s"$globalTempDB.src")) + +// We can also use Dataset API to create global temp view +Seq(1 -> "a").toDF("i", "j").createGlobalTempView("src") +checkAnswer(spark.table(s"$globalTempDB.src"), Row(1, "a")) + +// Use qualified name to rename a global temp view. +sql(s"ALTER VIEW $globalTempDB.src RENAME TO src2") --- End diff -- i see. Thanks for the explanation! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15292#discussion_r82538242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -17,47 +17,132 @@ package org.apache.spark.sql.execution.datasources.jdbc +import java.sql.{Connection, DriverManager} +import java.util.Properties + /** * Options for the JDBC data source. */ class JDBCOptions( @transient private val parameters: Map[String, String]) extends Serializable { + import JDBCOptions._ + + def this(url: String, table: String, parameters: Map[String, String]) = { +this(parameters ++ Map( + JDBCOptions.JDBC_URL -> url, + JDBCOptions.JDBC_TABLE_NAME -> table)) + } + + val asConnectionProperties: Properties = { +val properties = new Properties() +// We should avoid to pass the options into properties. See SPARK-17776. +parameters.filterKeys(!jdbcOptionNames.contains(_)) + .foreach { case (k, v) => properties.setProperty(k, v) } +properties + } + // // Required parameters // - require(parameters.isDefinedAt("url"), "Option 'url' is required.") - require(parameters.isDefinedAt("dbtable"), "Option 'dbtable' is required.") + require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") + require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") // a JDBC URL - val url = parameters("url") + val url = parameters(JDBC_URL) // name of table - val table = parameters("dbtable") + val table = parameters(JDBC_TABLE_NAME) + + // + // Optional parameters + // + val driverClass = { +val userSpecifiedDriverClass = parameters.get(JDBC_DRIVER_CLASS) +userSpecifiedDriverClass.foreach(DriverRegistry.register) + +// Performing this part of the logic on the driver guards against the corner-case where the +// driver returned for a URL is different on the driver and executors due to classpath +// differences. +userSpecifiedDriverClass.getOrElse { + DriverManager.getDriver(url).getClass.getCanonicalName +} + } // - // Optional parameter list + // Optional parameters only for reading // // the column used to partition - val partitionColumn = parameters.getOrElse("partitionColumn", null) + val partitionColumn = parameters.getOrElse(JDBC_PARTITION_COLUMN, null) // the lower bound of partition column - val lowerBound = parameters.getOrElse("lowerBound", null) + val lowerBound = parameters.getOrElse(JDBC_LOWER_BOUND, null) // the upper bound of the partition column - val upperBound = parameters.getOrElse("upperBound", null) + val upperBound = parameters.getOrElse(JDBC_UPPER_BOUND, null) // the number of partitions - val numPartitions = parameters.getOrElse("numPartitions", null) - + val numPartitions = parameters.getOrElse(JDBC_NUM_PARTITIONS, null) require(partitionColumn == null || (lowerBound != null && upperBound != null && numPartitions != null), -"If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," + - " and 'numPartitions' are required.") +s"If '$JDBC_PARTITION_COLUMN' is specified then '$JDBC_LOWER_BOUND', '$JDBC_UPPER_BOUND'," + + s" and '$JDBC_NUM_PARTITIONS' are required.") + val fetchSize = { +val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt +require(size >= 0, + s"Invalid value `${size.toString}` for parameter " + +s"`$JDBC_BATCH_FETCH_SIZE`. The minimum value is 0. When the value is 0, " + +"the JDBC driver ignores the value and does the estimates.") +size + } // - // The options for DataFrameWriter + // Optional parameters only for writing // // if to truncate the table from the JDBC database - val isTruncate = parameters.getOrElse("truncate", "false").toBoolean + val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean // the create table option , which can be table_options or partition_options. // E.g., "CREATE TABLE t (name string)
[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15219 I see. @davies, would it be possible to review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] UnsafeSorterSpillReader shoul...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82537791 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,91 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + */ +public final class NioBasedBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE = 8192; + + private final ByteBuffer bb; + + private final FileChannel ch; + + public NioBasedBufferedFileInputStream(File file, int bufferSize) throws IOException { +bb = ByteBuffer.allocateDirect(bufferSize); +ch = FileChannel.open(file.toPath(), StandardOpenOption.READ); +ch.read(bb); +bb.flip(); --- End diff -- ' ch.read(bb);' can be removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15292#discussion_r82537727 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -17,47 +17,132 @@ package org.apache.spark.sql.execution.datasources.jdbc +import java.sql.{Connection, DriverManager} +import java.util.Properties + /** * Options for the JDBC data source. */ class JDBCOptions( @transient private val parameters: Map[String, String]) extends Serializable { + import JDBCOptions._ + + def this(url: String, table: String, parameters: Map[String, String]) = { +this(parameters ++ Map( + JDBCOptions.JDBC_URL -> url, + JDBCOptions.JDBC_TABLE_NAME -> table)) + } + + val asConnectionProperties: Properties = { +val properties = new Properties() +// We should avoid to pass the options into properties. See SPARK-17776. +parameters.filterKeys(!jdbcOptionNames.contains(_)) + .foreach { case (k, v) => properties.setProperty(k, v) } +properties + } + // // Required parameters // - require(parameters.isDefinedAt("url"), "Option 'url' is required.") - require(parameters.isDefinedAt("dbtable"), "Option 'dbtable' is required.") + require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") + require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") // a JDBC URL - val url = parameters("url") + val url = parameters(JDBC_URL) // name of table - val table = parameters("dbtable") + val table = parameters(JDBC_TABLE_NAME) + + // + // Optional parameters + // + val driverClass = { +val userSpecifiedDriverClass = parameters.get(JDBC_DRIVER_CLASS) +userSpecifiedDriverClass.foreach(DriverRegistry.register) + +// Performing this part of the logic on the driver guards against the corner-case where the +// driver returned for a URL is different on the driver and executors due to classpath +// differences. +userSpecifiedDriverClass.getOrElse { + DriverManager.getDriver(url).getClass.getCanonicalName +} + } // - // Optional parameter list + // Optional parameters only for reading // // the column used to partition - val partitionColumn = parameters.getOrElse("partitionColumn", null) + val partitionColumn = parameters.getOrElse(JDBC_PARTITION_COLUMN, null) // the lower bound of partition column - val lowerBound = parameters.getOrElse("lowerBound", null) + val lowerBound = parameters.getOrElse(JDBC_LOWER_BOUND, null) // the upper bound of the partition column - val upperBound = parameters.getOrElse("upperBound", null) + val upperBound = parameters.getOrElse(JDBC_UPPER_BOUND, null) // the number of partitions - val numPartitions = parameters.getOrElse("numPartitions", null) - + val numPartitions = parameters.getOrElse(JDBC_NUM_PARTITIONS, null) require(partitionColumn == null || (lowerBound != null && upperBound != null && numPartitions != null), -"If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," + - " and 'numPartitions' are required.") +s"If '$JDBC_PARTITION_COLUMN' is specified then '$JDBC_LOWER_BOUND', '$JDBC_UPPER_BOUND'," + + s" and '$JDBC_NUM_PARTITIONS' are required.") + val fetchSize = { +val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt +require(size >= 0, + s"Invalid value `${size.toString}` for parameter " + +s"`$JDBC_BATCH_FETCH_SIZE`. The minimum value is 0. When the value is 0, " + +"the JDBC driver ignores the value and does the estimates.") +size + } // - // The options for DataFrameWriter + // Optional parameters only for writing // // if to truncate the table from the JDBC database - val isTruncate = parameters.getOrElse("truncate", "false").toBoolean + val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean // the create table option , which can be table_options or partition_options. // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB
[GitHub] spark pull request #15408: [SPARK-17839][CORE] UnsafeSorterSpillReader shoul...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82537694 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,77 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream} + * + */ +public final class NioBasedBufferedFileInputStream extends InputStream { + + ByteBuffer bb; + + FileChannel ch; + + public NioBasedBufferedFileInputStream(File file, int bufferSize) throws IOException { +bb = ByteBuffer.allocateDirect(bufferSize); +FileInputStream f = new FileInputStream(file); +ch = f.getChannel(); +ch.read(bb); +bb.flip(); + } + + public boolean refill() throws IOException { +if (!bb.hasRemaining()) { + bb.clear(); + int nRead = ch.read(bb); + if (nRead == -1) { +return false; + } + bb.flip(); +} +return true; + } + + @Override + public int read() throws IOException { +if (!refill()) { + return -1; +} +return bb.get(); + } + + @Override + public int read(byte[] b, int off, int len) throws IOException { +if (!refill()) { + return -1; +} +len = Math.min(len, bb.remaining()); +bb.get(b, off, len); +return len; + } + + @Override + public void close() throws IOException { +ch.close(); + } +} --- End diff -- `skip()` in InputStream will call ` read()`, this is not the optimal solution --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15412 **[Test build #66616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66616/consoleFull)** for PR 15412 at commit [`4d02864`](https://github.com/apache/spark/commit/4d02864d2b023bec501578de86b68478feae05c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15409: [Spark-14761][SQL] Reject invalid join methods when join...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15409 Oh well the test cases have issues. You can run those by `python/run-tests --module pyspark-sql` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15406: [Spark-17745][ml][PySpark] update NB python api
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15406 @sethah OK. and I'm checking whether there is something else need to update... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15411: Updated master url
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15411 LGTM - can you clean up the pr description to remove the messages from the template? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15412: [SPARK-17844] Simplify DataFrame API for defining frame ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15412 **[Test build #66615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66615/consoleFull)** for PR 15412 at commit [`98b77a7`](https://github.com/apache/spark/commit/98b77a7c660e0064353b1fa98e2e47bc2d971bea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15412: [SPARK-17844] Simplify DataFrame API for defining...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/15412 [SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions ## What changes were proposed in this pull request? When I was creating the example code for SPARK-10496, I realized it was pretty convoluted to define the frame boundaries for window functions when there is no partition column or ordering column. The reason is that we don't provide a way to create a WindowSpec directly with the frame boundaries. We can trivially improve this by adding rowsBetween and rangeBetween to Window object. As an example, to compute cumulative sum, before this pr: ``` df.select('key, sum("value").over(Window.partitionBy(lit(1)).rowsBetween(Long.MinValue, 0))) ``` After this pr: ``` df.select('key, sum("value").over(Window.rowsBetween(Long.MinValue, 0))) ``` ## How was this patch tested? Added test cases to compute cumulative sum in DataFrameWindowSuite for Scala/Java and tests.py for Python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-17844 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15412 commit 98b77a7c660e0064353b1fa98e2e47bc2d971bea Author: Reynold XinDate: 2016-10-10T01:15:15Z [SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15411: Updated master url
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15411 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15411: Updated master url
GitHub user getintouchapp opened a pull request: https://github.com/apache/spark/pull/15411 Updated master url ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) This is the Spark Scala example which was missing setting a master URL in Spark Session ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tested. Changes affect examples and documentation only (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Need to set master url to SparkSession for the example to run You can merge this pull request into a Git repository by running: $ git pull https://github.com/getintouchapp/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15411.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15411 commit 532403476678a8161d18a30ef12b21bffb4d5f92 Author: Ganesh KrishnanDate: 2016-10-10T01:13:40Z Updated master url Need to set master url to SparkSession for the example to run --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15409: [Spark-14761][SQL] Reject invalid join methods when join...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15409 **[Test build #3302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3302/consoleFull)** for PR 15409 at commit [`cec8ec4`](https://github.com/apache/spark/commit/cec8ec48de5f51f40ff4b929da0c0496fcc0a662). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82536925 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.util.Random + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.linalg.{Vector, VectorUDT} +import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators} +import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} +import org.apache.spark.ml.util.SchemaUtils +import org.apache.spark.sql._ +import org.apache.spark.sql.expressions.UserDefinedFunction +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ + +/** + * Params for [[LSH]]. + */ +@Experimental +@Since("2.1.0") +private[ml] trait LSHParams extends HasInputCol with HasOutputCol { + /** + * Param for the dimension of LSH OR-amplification. + * + * In this implementation, we use LSH OR-amplification to reduce the false negative rate. The + * higher the dimension is, the lower the false negative rate. + * @group param + */ + @Since("2.1.0") + final val outputDim: IntParam = new IntParam(this, "outputDim", "output dimension, where" + +"increasing dimensionality lowers the false negative rate, and decreasing dimensionality" + --- End diff -- Does increasing dimensionality lower the false negative rate? I think increasing dimensionality should lower false positive rate, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15388 **[Test build #66614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66614/consoleFull)** for PR 15388 at commit [`25f5d4d`](https://github.com/apache/spark/commit/25f5d4d068509d93630d56db2155f11cc2a9b301). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15409: [Spark-14761][SQL] Reject invalid join methods when join...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15409 **[Test build #3302 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3302/consoleFull)** for PR 15409 at commit [`cec8ec4`](https://github.com/apache/spark/commit/cec8ec48de5f51f40ff4b929da0c0496fcc0a662). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15409: [Spark-14761][SQL] Reject invalid join methods when join...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15409 The change itself LGTM, but also cc @srinathshankar. Right now Python behavior differs from Scala with respect to how crossJoin is handled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15409: [Spark-14761][SQL] Reject invalid join methods wh...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15409#discussion_r82536538 --- Diff: python/pyspark/sql/tests.py --- @@ -1508,6 +1508,23 @@ def test_toDF_with_schema_string(self): self.assertEqual(df.schema.simpleString(), "struct") self.assertEqual(df.collect(), [Row(key=i) for i in range(100)]) +# Regression test for invalid join methods when on is None, Spark-14761 +def test_invalid_join_method(self): +df1 = self.sqlCtx.createDataFrame([("Alice", 5), ("Bob", 8)], ["name", "age"]) +df2 = self.sqlCtx.createDataFrame([("Alice", 80), ("Bob", 90)], ["name", "height"]) +self.assertRaises(AnalysisException, lambda: df1.join(df2, how="invalid-join-type")) + +result = df1.join(df2, how="inner").select(df1.name, df2.height).collect() --- End diff -- can we remove everything from this test onward? they are no longer testing invalid join methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15388 @rxin Agree. Sorry for that. Will be more careful in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15388 @cloud-fan / @gatorsmile I left some comments on improving clarity. It's pretty important to the maintenance of the project. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14897: [SPARK-17338][SQL] add global temp view
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14897#discussion_r82536096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala --- @@ -94,6 +69,47 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { } /** + * Class for caching query results reused in future executions. + */ + val cacheManager: CacheManager = new CacheManager + + /** + * A listener for SQL-specific [[org.apache.spark.scheduler.SparkListenerEvent]]s. + */ + val listener: SQLListener = createListenerAndUI(sparkContext) + + /** + * A catalog that interacts with external systems. + */ + val externalCatalog: ExternalCatalog = +SharedState.reflect[ExternalCatalog, SparkConf, Configuration]( + SharedState.externalCatalogClassName(sparkContext.conf), + sparkContext.conf, + sparkContext.hadoopConfiguration) + + /** + * A manager for global temporary views. + */ + val globalTempViewManager = { +// System preserved database should not exists in metastore. However it's hard to guarantee it +// for every session, because case-sensitivity differs. Here we always lowercase it to make our +// life easier. +val globalTempDB = sparkContext.conf.get(GLOBAL_TEMP_DATABASE).toLowerCase +if (externalCatalog.databaseExists(globalTempDB)) { + throw new SparkException( +s"$globalTempDB is a system preserved database, please rename your existing database " + + "to resolve the name conflict and launch your Spark application again.") --- End diff -- oh no. I think it is fine to hide that conf. But, if the user hit this exception, seems it is better to let them know there is another workaround (renaming the existing db may not be easy). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15388: [SPARK-17821][SQL] Support And and Or in Expressi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15388#discussion_r82536075 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -80,6 +80,59 @@ class ExpressionSetSuite extends SparkFunSuite { setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper) setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper) + setTest(1, aUpper > bUpper && aUpper <= 10, aUpper <= 10 && aUpper > bUpper) + setTest(1, +aUpper > bUpper && bUpper > 100 && aUpper <= 10, +bUpper > 100 && aUpper <= 10 && aUpper > bUpper) + + setTest(1, aUpper > bUpper || aUpper <= 10, aUpper <= 10 || aUpper > bUpper) + setTest(1, +aUpper > bUpper || bUpper > 100 || aUpper <= 10, +bUpper > 100 || aUpper <= 10 || aUpper > bUpper) + + setTest(1, +bUpper > 100 || aUpper <= 10 && aUpper > bUpper, +bUpper > 100 || (aUpper <= 10 && aUpper > bUpper)) + + setTest(1, +aUpper > 10 && bUpper < 10 || aUpper >= bUpper, +(bUpper < 10 && aUpper > 10) || aUpper >= bUpper) + + setTest(1, --- End diff -- For these few test cases, they are getting too complicated that a human won't be able to tell immediately what the case is testing for. We should add some comment explaining what is being tested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org