[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150897857 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150899254 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9268#issuecomment-150899253 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150899252 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-150900046 **[Test build #44315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44315/consoleFull)** for PR 9270 at commit [`5e8efea`](https://github.com/apache/spark/commit/5e8efeacdf35df7281224338866a9b18207fd27f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150900013 **[Test build #44314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44314/consoleFull)** for PR 9269 at commit [`2822191`](https://github.com/apache/spark/commit/2822191e1c270237ca085757721c9746ad9b5734). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` logError(\"Sink class \" + classPath + \" cannot be instantiated\")`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150900038 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150900040 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44314/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8840#issuecomment-150900314 **[Test build #44316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/consoleFull)** for PR 8840 at commit [`cdf6dc4`](https://github.com/apache/spark/commit/cdf6dc424abba99a7fd091fca5ce2af56255f69a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150900669 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150900670 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150900664 **[Test build #44313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/consoleFull)** for PR 9180 at commit [`0a43033`](https://github.com/apache/spark/commit/0a4303356455f28ca3b87ffd446cb5ef5f25d0e2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9267#issuecomment-150897568 **[Test build #44311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/consoleFull)** for PR 9267 at commit [`81f667a`](https://github.com/apache/spark/commit/81f667a4537b60071cb1888ca88aa4bd0734ad2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class FPGrowthModel[Item: ClassTag: TypeTag] @Since(\"1.3.0\") (`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/9270 [SPARK-9162][SQL] Implement code generation for ScalaUDF JIRA: https://issues.apache.org/jira/browse/SPARK-9162 Currently ScalaUDF extends CodegenFallback and doesn't provide code generation implementation. This path implements code generation for ScalaUDF. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 scalaudf-codegen Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9270 commit 5e8efeacdf35df7281224338866a9b18207fd27f Author: Liang-Chi HsiehDate: 2015-10-25T08:00:31Z Add codegen support for ScalaUDF. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9268#issuecomment-150900205 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44312/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9268#issuecomment-150900204 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/8840#discussion_r42942564 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -128,30 +136,32 @@ private[sql] object PartitioningUtils { private[sql] def parsePartition( path: Path, defaultPartitionName: String, - typeInference: Boolean): Option[PartitionValues] = { + typeInference: Boolean): (Option[PartitionValues], Option[Path]) = { --- End diff -- A base path is not always associated with a `PartitionValues`. If there is no partition, we can still have a base path. That is why I don't make `case class PartitionValues(columnNames: Seq[String], literals: Seq[Literal])` to something like `case class PartitionValues(path: String, columnNames: Seq[String], literals: Seq[Literal])`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
GitHub user KaiXinXiaoLei opened a pull request: https://github.com/apache/spark/pull/9268 [SPARK-11298] When driver sends message "GetExecutorLossReason" to AM, the SparkContext may be stop I get lastest code form github, and just run "bin/spark-shell --master yarn --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=1 --conf spark.shuffle.service.enabled=true". There is error infor: 15/10/25 12:11:02 ERROR TransportChannelHandler: Connection to /9.96.1.113:35066 has been quiet for 12 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong. 15/10/25 12:11:02 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from vm113/9.96.1.113:35066 is closed 15/10/25 12:11:02 WARN NettyRpcEndpointRef: Ignore message Failure(java.io.IOException: Connection from vm113/9.96.1.113:35066 closed) 15/10/25 12:11:02 ERROR YarnScheduler: Lost executor 1 on vm111: Slave lost From log, when driver sends message "GetExecutorLossReason" to AM, the error appears. From code, i think AM gets this message, should reply. You can merge this pull request into a Git repository by running: $ git pull https://github.com/KaiXinXiaoLei/spark replayAM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9268 commit cf661ded1f864d9bb75293a63f758640c847010f Author: KaiXinXiaoLeiDate: 2015-10-25T06:33:01Z reply --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9268#issuecomment-150896693 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9267#issuecomment-150897588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9267#issuecomment-150897587 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150899436 **[Test build #44314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44314/consoleFull)** for PR 9269 at commit [`2822191`](https://github.com/apache/spark/commit/2822191e1c270237ca085757721c9746ad9b5734). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9268#issuecomment-150899445 **[Test build #44312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44312/consoleFull)** for PR 9268 at commit [`cf661de`](https://github.com/apache/spark/commit/cf661ded1f864d9bb75293a63f758640c847010f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8840#issuecomment-150900104 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8840#issuecomment-150900105 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150900439 @rxin I am curious that although I don't observe significant performance improvement from a simple projection + filter operation by now with simple experiment, by making this filters pushed down to Parquet side, do we retrieve less data and reduce the memory footprint? If so, even under the same performance level, is this patch still worth merging? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOCS] Fix link to Scala DataFram...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150898011 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299] Fix link to Scala DataFrame Func...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/9269 [SPARK-11299] Fix link to Scala DataFrame Functions reference The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-11299 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9269 commit 2822191e1c270237ca085757721c9746ad9b5734 Author: Josh RosenDate: 2015-10-25T07:04:22Z Fix link to Scala DataFrame functions reference --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150899440 **[Test build #44313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/consoleFull)** for PR 9180 at commit [`0a43033`](https://github.com/apache/spark/commit/0a4303356455f28ca3b87ffd446cb5ef5f25d0e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-150899665 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-150899670 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9268#issuecomment-150900188 **[Test build #44312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44312/consoleFull)** for PR 9268 at commit [`cf661de`](https://github.com/apache/spark/commit/cf661ded1f864d9bb75293a63f758640c847010f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` logError(\"Sink class \" + classPath + \" cannot be instantiated\")`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150904443 If we don't observe performance improvements, it's definitely not worth it. Can you post your how you measured it, and performance results? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150905113 How does pushdown avoid OOM? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150904969 ok. Thanks. Because we found that with pushdown filters, we can avoid the OOM problem when processing large data in our daily usage. I am wondering if it is helpful to others too. I will post the the performance test later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943239 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- Using scalaUDFClassName should not work because it is just `ScalaUDF`'s class name. I will try the static array approach later. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42942860 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- can we put these branches in a loop? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- Hmm, part of this can be reduced like you show. But seems we still have a (smaller) pattern matching. I will do it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943173 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- But I think we don't have callFunc? I just created callFunc later which is the code used to invoke the function in codegen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11277] [SQL] sort_array throws exceptio...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9247#issuecomment-150918873 Hi @jliwork , thanks for working on it! But sorting on array of null type doesn't make sense to me, can you check the behaviour of other SQL system like Hive? And how about struct type? It's also order-able. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r42945007 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging System.clearProperty("SPARK_YARN_MODE") } } + + test("Obtain tokens For HiveMetastore") { +val hadoopConf = new Configuration() +hadoopConf.set("hive.metastore.kerberos.principal", "bob") +// thrift picks up on port 0 and bails out, without trying to talk to endpoint +hadoopConf.set("hive.metastore.uris", "http://localhost:0;) +val util = new YarnSparkHadoopUtil +val e = intercept[InvocationTargetException] { + val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice") + fail(s"Expected an exception, got the token $token") +} +val inner = e.getCause +if (inner == null) { + fail("No inner cause", e) +} +if (!inner.isInstanceOf[HiveException]) { + fail(s"Not a hive exception", inner) --- End diff -- I want to include the inner exception if it's of the wrong type, so that the Junit/jenkins report can diagnose the failure. an `assert(inner.isInstanceOf[HiveException])` will say when the exception was of the wrong type, but not include the details, including the stack trace. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9269#issuecomment-150905138 Thanks - I've merged it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42942936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- You meant using a script to generate it like the `f`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9269 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943369 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- If you meant `function: AnyRef` in `ScalaUDF`, it is the user-defined function. `function.getClass.getName` will get something like ` org.apache.spark.sql.UDFSuite$$anonfun$10$$anonfun$apply$mcV$sp$12` in `UDFSuite`. So I think it should not work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8840#issuecomment-150911391 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8840#issuecomment-150911390 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8840#issuecomment-150911363 **[Test build #44316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/consoleFull)** for PR 8840 at commit [`cdf6dc4`](https://github.com/apache/spark/commit/cdf6dc424abba99a7fd091fca5ce2af56255f69a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r42944993 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -142,6 +145,104 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil { val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name()) ConverterUtils.toContainerId(containerIdString) } + + /** + * Obtains token for the Hive metastore, using the current user as the principal. + * Some exceptions are caught and downgraded to a log message. + * @param conf hadoop configuration; the Hive configuration will be based on this + * @return a token, or `None` if there's no need for a token (no metastore URI or principal + * in the config), or if a binding exception was caught and downgraded. + */ + def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = { +try { + obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName) +} catch { + case e: NoSuchMethodException => +logInfo("Hive Method not found", e) +None + case e: ClassNotFoundException => --- End diff -- +1. I'd left it in there as it may have had a valid reason for being there, but i do things it's correct. Detecting config problems, that is something to throw up. Note that `Client.obtainTokenForHBase()` has similar behaviour; this patch doesn't address it. When someone sits down to do it, the policy about how to react to failures could be converted into a wrapper around a closure which executes the token retrieval (here `obtainTokenForHiveMetastoreInner`), so there'd be no divergence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150905267 **[Test build #44317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/consoleFull)** for PR 9180 at commit [`59383fd`](https://github.com/apache/spark/commit/59383fd41f1d6b96274c564eb2fb7c96f5ab07e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42942985 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- Maybe I'm missing something, but I think you can just write a loop instead of having branches? ``` ... val funcClassName = callFunc.getClass.getName ... val evals = children.map(_.gen(ctx)) ... // generate callFunc ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150905272 Is that the case? I thought we load them one by one (or small batch at a time) and then apply the filter directly on them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11297] Add new code tags
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9265#issuecomment-150905287 Can you post a before and after screenshot? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150905206 Because we can pre-filtering the data? Without pushdown, the whole data will be loaded into memory and then has been filtered later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11279] [PYSPARK] Add DataFrame#toDF in ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9248#discussion_r42943011 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1266,6 +1266,17 @@ def drop(self, col): raise TypeError("col should be a string or a Column") return DataFrame(jdf, self.sql_ctx) +def toDF(self, *cols): --- End diff -- I think you need to add the ignore utf8 prefix annotation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943320 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- What about function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user Lewuathe commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150917384 @dbtsai Sorry for bothering many times but could check again please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r42944947 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging System.clearProperty("SPARK_YARN_MODE") } } + + test("Obtain tokens For HiveMetastore") { +val hadoopConf = new Configuration() +hadoopConf.set("hive.metastore.kerberos.principal", "bob") +// thrift picks up on port 0 and bails out, without trying to talk to endpoint +hadoopConf.set("hive.metastore.uris", "http://localhost:0;) +val util = new YarnSparkHadoopUtil +val e = intercept[InvocationTargetException] { + val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice") + fail(s"Expected an exception, got the token $token") --- End diff -- wanted to include any token returned in the assert failure, on the basis that if something came back, it would be useful to find out what went wrong. `intercept`, just like JUnit's `@Test(expected=)` feature, picks up on the failure to raise the specific exception, but doesn't appear to say much else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150904892 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150904896 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428][SQL] Removed unnecessary typecast...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9262#issuecomment-150905300 Thanks - I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-150905344 Hmm, I am not sure about that. Because I supposed that Parquet relation will read all data first if no pushdown filters are applied. Then Spark's `Filter` operation will be applied later. Maybe @liancheng can answer this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11292][SQL] Python API for text data so...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9259#issuecomment-150905342 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428][SQL] Removed unnecessary typecast...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9262 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943091 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- For that one, can you just do callFunc.getClass.getName? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- Ah sorry. Can we just use scalaUDFClassName? If not, we can create a static array in the beginning that covers all 22 versions of this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150911125 **[Test build #44317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/consoleFull)** for PR 9180 at commit [`59383fd`](https://github.com/apache/spark/commit/59383fd41f1d6b96274c564eb2fb7c96f5ab07e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-150911095 **[Test build #44315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44315/consoleFull)** for PR 9270 at commit [`5e8efea`](https://github.com/apache/spark/commit/5e8efeacdf35df7281224338866a9b18207fd27f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943394 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- Why wouldn't it work? Isn't that better because we can even specialize it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-150911128 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44315/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/9270#discussion_r42943415 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -959,6 +963,861 @@ case class ScalaUDF( } } + // Generate codes used to convert the arguments to Scala type for user-defined funtions + private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): String = { +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName +val scalaUDFClassName = classOf[ScalaUDF].getName + +val converterTerm = ctx.freshName("converter" + index.toString) +ctx.addMutableState(converterClassName, converterTerm, + s"this.$converterTerm = ($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).getChildren().apply($index))).dataType());") +converterTerm + } + + override def genCode( + ctx: CodeGenContext, + ev: GeneratedExpressionCode): String = { + +ctx.references += this + +val scalaUDFClassName = classOf[ScalaUDF].getName +val converterClassName = classOf[Any => Any].getName +val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" +val expressionClassName = classOf[Expression].getName + +// Generate codes used to convert the returned value of user-defined functions to Catalyst type +val catalystConverterTerm = ctx.freshName("catalystConverter") +ctx.addMutableState(converterClassName, catalystConverterTerm, + s"this.$catalystConverterTerm = ($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size - 1}]).dataType());") + +val resultTerm = ctx.freshName("result") + +val (evalCode, callFunc) = children.size match { --- End diff -- Hmm, you are right. I should have it a try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150911153 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9180#issuecomment-150911155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9270#issuecomment-150911127 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...
Github user felixb commented on the pull request: https://github.com/apache/spark/pull/8639#issuecomment-150916898 Is there anything else I can do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r42944952 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging System.clearProperty("SPARK_YARN_MODE") } } + + test("Obtain tokens For HiveMetastore") { +val hadoopConf = new Configuration() +hadoopConf.set("hive.metastore.kerberos.principal", "bob") +// thrift picks up on port 0 and bails out, without trying to talk to endpoint +hadoopConf.set("hive.metastore.uris", "http://localhost:0;) +val util = new YarnSparkHadoopUtil +val e = intercept[InvocationTargetException] { + val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice") + fail(s"Expected an exception, got the token $token") +} +val inner = e.getCause +if (inner == null) { + fail("No inner cause", e) --- End diff -- good point --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9226#issuecomment-150980439 @liancheng My concern is that if we do not lowercase those partition names, Hive will not be able to read those partition dirs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10622] [core] [yarn] Differentiate dead...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8887#issuecomment-150992179 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10622] [core] [yarn] Differentiate dead...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8887#issuecomment-150992191 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10342] [SQL] [WIP] Cooperative memory m...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/9241#discussion_r42953420 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -227,62 +238,147 @@ public BytesToBytesMap( */ public int numElements() { return numElements; } - public static final class BytesToBytesMapIterator implements Iterator { + public final class BytesToBytesMapIterator implements Iterator { -private final int numRecords; -private final Iterator dataPagesIterator; +private int numRecords; private final Location loc; private MemoryBlock currentPage = null; -private int currentRecordNumber = 0; +private int recordsInPage = 0; private Object pageBaseObject; private long offsetInPage; // If this iterator destructive or not. When it is true, it frees each page as it moves onto // next one. private boolean destructive = false; -private BytesToBytesMap bmap; -private BytesToBytesMapIterator( -int numRecords, Iterator dataPagesIterator, Location loc, -boolean destructive, BytesToBytesMap bmap) { +private LinkedList spillWriters = + new LinkedList(); +private UnsafeSorterSpillReader reader = null; + +private BytesToBytesMapIterator(int numRecords, Location loc, boolean destructive) { this.numRecords = numRecords; - this.dataPagesIterator = dataPagesIterator; this.loc = loc; this.destructive = destructive; - this.bmap = bmap; - if (dataPagesIterator.hasNext()) { -advanceToNextPage(); - } + destructiveIterator = this; } private void advanceToNextPage() { - if (destructive && currentPage != null) { -dataPagesIterator.remove(); -this.bmap.taskMemoryManager.freePage(currentPage); -this.bmap.shuffleMemoryManager.release(currentPage.size()); + synchronized (this) { +int nextIdx = dataPages.indexOf(currentPage) + 1; +if (destructive && currentPage != null) { + dataPages.remove(currentPage); + taskMemoryManager.freePage(currentPage); + shuffleMemoryManager.release(currentPage.size()); + nextIdx --; +} +if (dataPages.size() > nextIdx) { + currentPage = dataPages.get(nextIdx); + pageBaseObject = currentPage.getBaseObject(); + offsetInPage = currentPage.getBaseOffset(); + recordsInPage = Platform.getInt(pageBaseObject, offsetInPage); + offsetInPage += 4; +} else { + currentPage = null; + try { +reader = spillWriters.removeFirst().getReader(blockManager); +recordsInPage = -1; + } catch (IOException e) { +// Scala iterator does not handle exception +Platform.throwException(e); + } +} } - currentPage = dataPagesIterator.next(); - pageBaseObject = currentPage.getBaseObject(); - offsetInPage = currentPage.getBaseOffset(); } @Override public boolean hasNext() { - return currentRecordNumber != numRecords; + return numRecords > 0; } @Override public Location next() { - int totalLength = Platform.getInt(pageBaseObject, offsetInPage); - if (totalLength == END_OF_PAGE_MARKER) { + if (recordsInPage == 0) { advanceToNextPage(); -totalLength = Platform.getInt(pageBaseObject, offsetInPage); } - loc.with(currentPage, offsetInPage); - offsetInPage += 4 + totalLength; - currentRecordNumber++; - return loc; + numRecords --; + if (currentPage != null) { +int totalLength = Platform.getInt(pageBaseObject, offsetInPage); +loc.with(currentPage, offsetInPage); +offsetInPage += 4 + totalLength; +recordsInPage --; +return loc; + } else { +assert(reader != null); +if (!reader.hasNext()) { + advanceToNextPage(); +} +try { + reader.loadNext(); +} catch (IOException e) { + // Scala iterator does not handle exception + Platform.throwException(e); +} +loc.with(reader.getBaseObject(), reader.getBaseOffset(), reader.getRecordLength()); +return loc; + } +} + +public long spill(long numBytes) throws IOException { + synchronized (this) { +if (!destructive || dataPages.size() == 1) { +
[GitHub] spark pull request: [SPARK-11272][Core][UI][WIP] Support importing...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9238#issuecomment-150995352 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11272][Core][UI][WIP] Support importing...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9238#issuecomment-150995368 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11306] Fix hang when JVM exits.
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/9273#issuecomment-150995282 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8652#discussion_r42954743 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -268,6 +268,27 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { object CartesianProduct extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { + // Not like the equal-join, BroadcastNestedLoopJoin doesn't support condition + // for cartesian join, as in cartesian join, probably, the records satisfy the + // condition, but exists in another partition of the large table, so we may not able + // to eliminate the duplicates. --- End diff -- Yes, the comment is stale. If we restrict the outer join condition as `None` here, then it's more like a `CartesianProduct`, that's why I put the rule in the `CartesianProduct`, and more importantly, we'd like to take those 2 rules as higher priority than the rule in Line 292. I am totally agree with you to combine the `CartesianProduct` and `BroadcastNestedLoopJoin`, as the later just a special case of former. Will update the code soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11178] Improving naming around task fai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9164#issuecomment-151007358 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44322/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11178] Improving naming around task fai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9164#issuecomment-151007357 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11272][Core][UI] Support importing and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9238#issuecomment-151009628 **[Test build #44324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44324/consoleFull)** for PR 9238 at commit [`af8b3cb`](https://github.com/apache/spark/commit/af8b3cb03f89d2b05e2566371d44405c5f8237d3). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11272][Core][UI] Support importing and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9238#issuecomment-151009668 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44324/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8652#issuecomment-151009666 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8652#issuecomment-151009673 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8652#discussion_r42957158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -295,8 +295,21 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } - object BroadcastNestedLoopJoin extends Strategy { + object CartesianProduct extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { + case logical.Join( +CanBroadcast(left), right, joinType, condition) if joinType != LeftSemiJoin => +execution.joins.BroadcastNestedLoopJoin( + planLater(left), planLater(right), joins.BuildLeft, joinType, condition) :: Nil + case logical.Join( +left, CanBroadcast(right), joinType, condition) if joinType != LeftSemiJoin => +execution.joins.BroadcastNestedLoopJoin( + planLater(left), planLater(right), joins.BuildRight, joinType, condition) :: Nil + case logical.Join(left, right, _, None) => +execution.joins.CartesianProduct(planLater(left), planLater(right)) :: Nil + case logical.Join(left, right, Inner, Some(condition)) => +execution.Filter(condition, + execution.joins.CartesianProduct(planLater(left), planLater(right))) :: Nil case logical.Join(left, right, joinType, condition) => val buildSide = if (right.statistics.sizeInBytes <= left.statistics.sizeInBytes) { --- End diff -- Yes, I think so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11272][Core][UI] Support importing and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9238#issuecomment-151009667 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8652#discussion_r42957141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala --- @@ -44,8 +44,7 @@ class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies { EquiJoinSelection :: InMemoryScans :: BasicOperators :: - CartesianProduct :: - BroadcastNestedLoopJoin :: Nil) + CartesianProduct :: Nil) --- End diff -- Any suggestion for the name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/8652#discussion_r42958162 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala --- @@ -44,8 +44,7 @@ class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies { EquiJoinSelection :: InMemoryScans :: BasicOperators :: - CartesianProduct :: - BroadcastNestedLoopJoin :: Nil) + CartesianProduct :: Nil) --- End diff -- how about `NonEquiJoinSelection`? w.r.t `EquiJoinSelection` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9226#issuecomment-151017266 Had a discussion with @liancheng and @cloud-fan. We think this fix is better than #9251. When a data source table is partitioned, we will not save data in a Hive-compatible way. So, let's make sure we can read data back. Later, when we want to save partitioned data in a hive compatible way, we can discuss the right way for that in that JIRA (we may want to add a flag for it since the saved metadata may be quite different). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10891][STREAMING][KINESIS] Add MessageH...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/8954#issuecomment-151019368 Great. I will merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10984] Simplify *MemoryManager class st...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/9127#issuecomment-151020332 Woohoo, this passes tests! There are still a few minor follow-up tasks that I'd like to do for this, but I'm going to defer them to separate patches: this patch is fairly large and has conflicts with several other memory-management-related patches that are in-flight or which will be opened soon. @andrewor14, if you have any post-hoc review comments, I'll handle them in a followup. @davies, this should unblock your open patch. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10891][STREAMING][KINESIS] Add MessageH...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8954 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/9258#discussion_r42951707 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -258,6 +258,16 @@ abstract class RDD[T: ClassTag]( * subclasses of RDD. */ final def iterator(split: Partition, context: TaskContext): Iterator[T] = { +if (!isCheckpointedAndMaterialized) { --- End diff -- just out of curiosity, any reason not to do an `if`/`else` here? ``` if (!isCheckpointedAndMaterialized && checkpointData.exists(_.isInstanceOf[ReliableRDDCheckpointData[T]])) { SparkEnv.get.checkpointMananger.getOrCompute( this, checkpointData.get.asInstanceOf[ReliableRDDCheckpointData[T]], split, context) } else { computeOrReadCache(split, context) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11306] Fix hang when JVM exits.
GitHub user kayousterhout opened a pull request: https://github.com/apache/spark/pull/9273 [SPARK-11306] Fix hang when JVM exits. This commit fixes a bug where, in Standalone mode, if a task fails and crashes the JVM, the failure is considered a "normal failure" (meaning it's considered unrelated to the task), so the failure isn't counted against the task's maximum number of failures: https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138. As a result, if a task fails in a way that results in it crashing the JVM, it will continuously be re-launched, resulting in a hang. This commit fixes that problem. This bug was introduced by #8007; @andrewor14 @mcchea @vanzin can you take a look at this? This error is hard to trigger because we handle executor losses through 2 code paths (the second is via Akka, where Akka notices that the executor endpoint is disconnected). In my setup, the Akka code path completes first, and doesn't have this bug, so things work fine (see my recent email to the dev list about this). If I manually disable the Akka code path, I can see the hang (and this commit fixes the issue). You can merge this pull request into a Git repository by running: $ git pull https://github.com/kayousterhout/spark-1 SPARK-11306 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9273.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9273 commit 42a1defca0b2f0c9558b6ad8d24c6b1eb389ea10 Author: Kay OusterhoutDate: 2015-10-25T23:46:20Z [SPARK-11306] Fix hang when JVM exits. This commit fixes a bug where, in Standalone mode, if a task fails and crashes the JVM, the failure is considered a "normal failure" (meaning it's considered unrelated to the task), so the failure isn't counted against the task's maximum number of failures: https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138. As a result, if a task fails in a way that results in it crashing the JVM, it will continuously be re-launched, resulting in a hang. This commit fixes that problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org