[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59686863 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21906/consoleFull) for PR 2824 at commit [`be0533a`](https://github.com/apache/spark/commit/be0533a88f6b624629ac66cfeb9989337c002cfd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19069043 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LeastSquaresError.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.tree.loss + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.tree.model.DecisionTreeModel + +/** + * Class for least squares error loss calculation. + */ +object LeastSquaresError extends Loss { --- End diff -- I am making an attempt to add a mathematical statement. Let me know if we need to be more descriptive. I plan to be more formal in the actual documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/2852 [WIP] Add WebUITableBuilder to simplify table-building code This work-in-progress commit illustrates a weekend hack project that I came up with for significantly simplifying the web UI's table rendering code. See the huge block comment in `WebUITableBuilder` for more details. This isn't ready to merge yet; I wanted to get some feedback before converting the rest of the table construction code to use this (I know that I should open a JIRA for this, too; I'll do it tomorrow). Essentially, this commit adds a small builder class for constructing objects that know how to render web UI tables. This builder helps us to avoid several sources of errors / maintenance headaches, such as duplicate/boilerplate markup, inconsistent formatting of columns in different tables (e.g. durations or memory being displayed differently), separation of column names from column data values, etc. This is best illustrated by some sample code; this new framework lets you write ```scala private val appTable: UITable[ApplicationInfo] = { val builder = new UITableBuilder[ApplicationInfo]() import builder._ customCol(ID) { app = a href={app?appId= + app.id}{app.id}/a } col(Name) { _.id } intCol(Cores) { _.coresGranted } memCol(Memory per Node) { _.desc.memoryPerSlave } dateCol(Submitted Time) { _.submitDate } col(User) { _.desc.user } col(State) { _.state.toString } durationCol(Duration) { _.duration } build } ``` to render the applications table in the standalone Master UI. I find this significantly easier to understand and maintain than the old code. For example, this makes it trivial to re-order columns. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark webui-table-builder Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2852.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2852 commit c0aca09d676ce750496451f3691c5f9e861103bd Author: Josh Rosen joshro...@databricks.com Date: 2014-10-20T06:02:29Z Add WebUITableBuilder to clean up table building code. This significantly simplifies / abstracts the web UI's table construction code, which seems to account for the majority of the UI code. I haven't converted all tables to use this yet; this commit just provides the basic framework and a few example usages in the master web UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2852#issuecomment-59687664 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21907/consoleFull) for PR 2852 at commit [`c0aca09`](https://github.com/apache/spark/commit/c0aca09d676ce750496451f3691c5f9e861103bd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2852#issuecomment-59687734 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21907/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2852#issuecomment-59687732 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21907/consoleFull) for PR 2852 at commit [`c0aca09`](https://github.com/apache/spark/commit/c0aca09d676ce750496451f3691c5f9e861103bd). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59688035 Hi @JoshRosen , I just set `transferToEnabled` to false as default value, unless users explicitly set it to true, `transferTo` will not be enabled. Currently, only `ExternalSorter` use this API as file to file copying and this is controlled by configuration `spark.file.transferTo`, other uses of `copyStream` in Spark code are all not file to file copying, so this parameter will not take effect. If future uses of `copyStream`, user have to get `transferToEnabled` from configuration, I add some usage notes here. Still user can bypass `spark.file.transferTo` and directly set this parameter to true, but they have to be responsible for the correctness of usage. The reason I didn't take `SparkConf` as a parameter to control the behavior is that it should modify lots of the current codes to get `SparkConf` in which it calls `copyStream`. So what is your opinion? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59688343 HI @jerryshao, Changing the default is exactly what I had in mind. This looks good to me! (Going to bed now; I'll merge this tomorrow and backport to `branch-1.1`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59688531 Thanks a lot :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2842#issuecomment-59688527 @JoshRosen @pwendell I know the reason of this problemã In idea, I should right click the project and click maven-reimport --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/2842 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59688582 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21905/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59688580 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21905/consoleFull)** for PR 2839 at commit [`d6fdb2a`](https://github.com/apache/spark/commit/d6fdb2a40d8fbdcfadf3b27bc82e0bbdbdc808fe) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19069469 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * If for a query, the ranking algorithm returns n (n k) results, + * the precision value will be computed as #(relevant items retrived) / k. + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = predictionAndLabels.map { case (pred, lab) = +val labSet = lab.toSet +val n = math.min(pred.length, k) +var i = 0 +var cnt = 0 + +while (i n) { + if (labSet.contains(pred(i))) { +cnt += 1 + } + i += 1 +} +cnt.toDouble / k + }.mean + + /** + * Returns the mean average precision (MAP) of all the queries + */ + lazy val meanAveragePrecision: Double = predictionAndLabels.map { case (pred, lab) = +val labSet = lab.toSet +var i = 0 +var cnt = 0 +var precSum = 0.0 +val n = pred.length + +while (i n) { + if (labSet.contains(pred(i))) { +cnt += 1 +precSum += cnt.toDouble / (i + 1) + } + i += 1 +} +precSum / labSet.size + }.mean + + /** + * Compute the average NDCG value of all the queries, truncated at ranking position k. + * If for a query, the ranking algorithm returns n (n k) results, the NDCG value at + * at position n will be used. See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated ndcg + * @return the average ndcg at the first k ranking positions + */ + def ndcgAt(k: Int): Double = predictionAndLabels.map { case (pred, lab) = +val labSet = lab.toSet +val labSetSize = labSet.size +val n = math.min(math.max(pred.length, labSetSize), k) +var maxDcg = 0.0 +var dcg = 0.0 +var i = 0 + +while (i n) { + // Calculate 1/log2(i + 2) + val gain = math.log(2) / math.log(i + 2) --- End diff -- `math.log(2)` is by definition but not necessary because of the normalization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59682641 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21905/consoleFull) for PR 2839 at commit [`d6fdb2a`](https://github.com/apache/spark/commit/d6fdb2a40d8fbdcfadf3b27bc82e0bbdbdc808fe). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59681217 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21900/consoleFull) for PR 2743 at commit [`329a30d`](https://github.com/apache/spark/commit/329a30debca49f0b4329944bc5ad152dd218689f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-59690664 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21908/consoleFull) for PR 2607 at commit [`2ae97b7`](https://github.com/apache/spark/commit/2ae97b74ccc0e7fc3f34d435264768a1403a7a0c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59691398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21906/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59691389 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21906/consoleFull) for PR 2824 at commit [`be0533a`](https://github.com/apache/spark/commit/be0533a88f6b624629ac66cfeb9989337c002cfd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/2853 [SPARK-4005][CORE] handle message replies in receive instead of in the individual private methods In BlockManagermasterActor, when handling message type UpdateBlockInfo, the message replies is in handled in individual private methods, should handle it in receive of Akka. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark akkaRecv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2853.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2853 commit d4b929b49b7962131e514783ab1ca1024244b48e Author: Zhang, Liye liye.zh...@intel.com Date: 2014-10-20T07:30:46Z [SPARK-4005][CORE] handle message replies in receive instead of in the individual private methods --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2853#issuecomment-59697011 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21909/consoleFull) for PR 2853 at commit [`d4b929b`](https://github.com/apache/spark/commit/d4b929b49b7962131e514783ab1ca1024244b48e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-59697993 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21908/consoleFull) for PR 2607 at commit [`2ae97b7`](https://github.com/apache/spark/commit/2ae97b74ccc0e7fc3f34d435264768a1403a7a0c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GradientBoosting (` * `case class BoostingStrategy(` * `trait Loss extends Serializable ` * `class GradientBoostingModel(trees: Array[DecisionTreeModel], strategy: BoostingStrategy)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-59697998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21908/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2853#issuecomment-59706895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21909/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-59706916 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2853#issuecomment-59706889 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21909/consoleFull) for PR 2853 at commit [`d4b929b`](https://github.com/apache/spark/commit/d4b929b49b7962131e514783ab1ca1024244b48e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-59707432 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21911/consoleFull) for PR 1567 at commit [`88b939e`](https://github.com/apache/spark/commit/88b939e3deb15f4ed16a727b33af879fa103c913). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-59708199 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21910/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-59714668 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21911/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-59714664 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21911/consoleFull) for PR 1567 at commit [`88b939e`](https://github.com/apache/spark/commit/88b939e3deb15f4ed16a727b33af879fa103c913). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class GroupingSet(bitmasks: Seq[Int], ` * `case class Cube(groupByExprs: Seq[Expression],` * `case class Rollup(groupByExprs: Seq[Expression],` * `case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)` * `case class GroupingSetExpansion(` * `case class GroupingSetExpansion(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Block Manager - Double Register Crash
GitHub user tsliwowicz opened a pull request: https://github.com/apache/spark/pull/2854 Block Manager - Double Register Crash In long running contexts, we encountered the situation of double register without a remove in between. The cause for that is unknown, and assumed a temp network issue. However, since the second register is with a BlockManagerId on a different port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor returns Some. This inconsistency is caught in a conditional statement that does System.exit(1), which is a huge robustness issue for us. The fix - simply remove the old id from both maps during register when this happens. We are mimicking the behavior of expireDeadHosts(), by doing local cleanup of the maps before trying to add new ones. Also - added some logging for register and unregister. https://issues.apache.org/jira/browse/SPARK-4006 You can merge this pull request into a Git repository by running: $ git pull https://github.com/taboola/spark branch-0.9.2-block-mgr-removal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2854.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2854 commit efd93f2026ddc427e84fa03e8a595ded2b1a81ce Author: Tal Sliwowicz ta...@taboola.com Date: 2014-10-12T08:35:20Z In long running contexts, we encountered the situation of double register without a remove in between. The cause for that is unknown, and assumed a temp network issue. However, since the second register is with a BlockManagerId on a different port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor returns Some. This inconsistency is caught in a conditional statement that does System.exit(1), which is a huge robustness issue for us. The fix - simply remove the old id from both maps during register when this happens. We are mimicking the behavior of expireDeadHosts(), by doing local cleanup of the maps before trying to add new ones. Also - added some logging for register and unregister. commit 81d69f088e421b19e47495d06e8b187a0ec29075 Author: Tal Sliwowicz ta...@taboola.com Date: 2014-10-12T08:41:53Z fixed comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] Ensure that files are fetched ato...
GitHub user preaudc opened a pull request: https://github.com/apache/spark/pull/2855 [SPARK-3967] Ensure that files are fetched atomically tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic You can merge this pull request into a Git repository by running: $ git pull https://github.com/preaudc/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2855.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2855 commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd Author: Christophe Préaud christophe.pre...@kelkoo.com Date: 2014-10-20T09:58:56Z Ensure that files are fetched atomically tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Block Manager - Double Register Crash
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2854#issuecomment-59722949 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] Ensure that files are fetched ato...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2855#issuecomment-59722947 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-59723719 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21912/consoleFull) for PR 2342 at commit [`35fb0f6`](https://github.com/apache/spark/commit/35fb0f67be7f2f7223e010eca300a7c1ad295c18). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-59723812 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21912/consoleFull) for PR 2342 at commit [`35fb0f6`](https://github.com/apache/spark/commit/35fb0f67be7f2f7223e010eca300a7c1ad295c18). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-59723813 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21912/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59724233 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21913/consoleFull) for PR 2816 at commit [`a580dd4`](https://github.com/apache/spark/commit/a580dd436d007265ed6cdba9666f9d27e3025f57). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-59724241 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21914/consoleFull) for PR 2342 at commit [`afffb05`](https://github.com/apache/spark/commit/afffb05ba1b83564c875ac3bb2aad64339991587). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-59724343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21914/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-59724340 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21914/consoleFull) for PR 2342 at commit [`afffb05`](https://github.com/apache/spark/commit/afffb05ba1b83564c875ac3bb2aad64339991587). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: mesos executor ids now consist of the slave id...
Github user tsliwowicz commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-59724452 @mateiz - @KashiErez and I went on a different route. The killer issue was that there is a System.exit(1) in BlockManagerMasterActor which was a huge robustness issue for us. @taboola we are running some pretty large clusters (process many tera bytes of data / day) which do real time calculations and are mission critical. So - we fixed the issue and it's been running successfully in our production for a while now. I opened a new ticket - https://issues.apache.org/jira/browse/SPARK-4006 And a pull request - https://github.com/apache/spark/pull/2854 What do you think about our fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/2841#issuecomment-59726357 This PR also fixes : https://issues.apache.org/jira/browse/SPARK-1847 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59732528 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21913/consoleFull) for PR 2816 at commit [`a580dd4`](https://github.com/apache/spark/commit/a580dd436d007265ed6cdba9666f9d27e3025f57). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59732530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21913/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/2856 [SPARK-4008] Fix kryo with fold in KryoSerializerSuite `zeroValue` will be serialized by `spark.closure.serializer` but `spark.closure.serializer` only supports the default Java serializer. So it must not be `ClassWithoutNoArgConstructor`, which can not be serialized by the Java serializer. This PR changed `zeroValue` to null and updated the test to make it work correctly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-4008 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2856.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2856 commit 51da6558754f34097d4aa9ee8b15fa04ae01b9bf Author: zsxwing zsxw...@gmail.com Date: 2014-10-20T11:35:12Z [SPARK-4008] Fix kryo with fold in KryoSerializerSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2856#issuecomment-59734858 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21915/consoleFull) for PR 2856 at commit [`51da655`](https://github.com/apache/spark/commit/51da6558754f34097d4aa9ee8b15fa04ae01b9bf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-59736352 This is the gist of dependency tree for artifacts published by this patch. https://gist.github.com/ScrapCodes/a5857e57d828b4b787ff --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2856#issuecomment-59745195 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21915/consoleFull) for PR 2856 at commit [`51da655`](https://github.com/apache/spark/commit/51da6558754f34097d4aa9ee8b15fa04ae01b9bf). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2856#issuecomment-59745203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21915/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
GitHub user YanTangZhai opened a pull request: https://github.com/apache/spark/pull/2857 [SPARK-4009][SQL]HiveTableScan should use makeRDDForTable instead of makeRDDForPartitionedTable for partitioned table when partitionPruningPred is None HiveTableScan should use makeRDDForTable instead of makeRDDForPartitionedTable for partitioned table when partitionPruningPred is None. If a table has many partitions for example more than 20 thousands while it has a few data for example less than 512MB, some sql querying the table will produce more than 2 RDDs. The job would submit failed with exception: java stack overflow. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanTangZhai/spark SPARK-4009 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2857.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2857 commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-06T13:07:08Z Merge pull request #1 from apache/master update commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-08-20T13:14:08Z Merge pull request #3 from apache/master Update commit 8a0010691b669495b4c327cf83124cabb7da1405 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-12T06:54:58Z Merge pull request #6 from apache/master Update commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3 Author: YanTangZhai hakeemz...@tencent.com Date: 2014-09-16T12:03:22Z Merge pull request #7 from apache/master Update commit 76d40277d51f709247df1d3734093bf2c047737d Author: YanTangZhai hakeemz...@tencent.com Date: 2014-10-20T12:52:22Z Merge pull request #8 from apache/master update commit be7882ce16911d018571fa46c1a175d063bdfd03 Author: yantangzhai tyz0...@163.com Date: 2014-10-20T13:05:44Z [SPARK-4009][SQL]HiveTableScan should use makeRDDForTable instead of makeRDDForPartitionedTable for partitioned table when partitionPruningPred is None --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2857#issuecomment-59751379 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21916/consoleFull) for PR 2857 at commit [`be7882c`](https://github.com/apache/spark/commit/be7882ce16911d018571fa46c1a175d063bdfd03). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2857#issuecomment-59751519 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21916/consoleFull) for PR 2857 at commit [`be7882c`](https://github.com/apache/spark/commit/be7882ce16911d018571fa46c1a175d063bdfd03). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2857#issuecomment-59751523 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21916/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2857#issuecomment-59755633 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21917/consoleFull) for PR 2857 at commit [`db0ce73`](https://github.com/apache/spark/commit/db0ce732e51d5813609f80722c20147b7c33bd23). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3875] Add TEMP DIRECTORY configuration
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2729#issuecomment-59756273 Yes, as @mridulm pointed out. This should not be settable by the users on yarn. It should automatically use the yarn approved directories. We have logic in there for setting the java.io.tmpdir in ClientBase. If this is added we would need to do something similar and not let the user override it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2857#issuecomment-59758705 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21917/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2857#issuecomment-59758697 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21917/consoleFull) for PR 2857 at commit [`db0ce73`](https://github.com/apache/spark/commit/db0ce732e51d5813609f80722c20147b7c33bd23). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], converter: S = T)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-59761949 Hey @pwendell, I have updated this patch to include effective pom changes. So that you can try it out. Also I think this is ready for review ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2615#discussion_r19086365 --- Diff: core/pom.xml --- @@ -264,6 +284,10 @@ scopetest/scope /dependency dependency + groupIdcom.twitter/groupId + artifactIdchill-java/artifactId +/dependency --- End diff -- Note to self: remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-59763811 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21918/consoleFull) for PR 2615 at commit [`812db5b`](https://github.com/apache/spark/commit/812db5bb3b70c2b20cd1ec1d05f376003e554b41). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59769268 @MLnick It doesn't look like `pairRDDToPython` does the trick. I tried ```{python} def userFeatures(self): juf = self._java_model.userFeatures() juf = sc._jvm.SerDeUtil.pairRDDToPython(juf, 1) return juf ``` but what comes out when I try to print the result of taking the first element of the RDD is just [[B@176fa1a5 rather than any kind of nicely formatted python object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3657] yarn alpha YarnRMClientImpl throw...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2728#issuecomment-59771521 So one issue is that the scheme was added to properly handle when yarn using https (SPARK-3286). If client mode isn't passing the scheme then that is probably broken. If it was passing the scheme that you wouldn't hit this issue. I think changing the YarnClientSchedulerBackend.start routine where it sets the spark.driver.appUIAddress would be the equivalent change. And then we would need to test. With the above change it would have the scheme included and wouldn't hit the null. If we want to add the check in anyway for handling the case where it is null just in case something else comes up, thats fine, but I'm not real fond of pattern matching here. How about just checking the URI.getScheme and if null we pass it in as is, otherwise we do the getAuthority()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3925][SQL] Do not consider the ordering...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2783#issuecomment-59772318 Other comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/2858 [SPARK-4010][Web UI]Spark UI returns 500 in yarn-client mode The problem caused by #1966 CC @YanTangZhai @andrewor14 You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-4010 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2858.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2858 commit 9866fbfacf90f319dc1e318077f7d433e1bcb222 Author: GuoQiang Li wi...@qq.com Date: 2014-10-20T15:04:09Z Spark UI returns 500 in yarn-client mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2858#issuecomment-59781548 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21919/consoleFull) for PR 2858 at commit [`9866fbf`](https://github.com/apache/spark/commit/9866fbfacf90f319dc1e318077f7d433e1bcb222). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix broken links in README.md
GitHub user ryan-williams opened a pull request: https://github.com/apache/spark/pull/2859 fix broken links in README.md seems like `building-spark.html` was renamed to `building-with-maven.html`? Is Maven the blessed build tool these days, or SBT? I couldn't find a building-with-sbt page so I went with the Maven one here. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ryan-williams/spark broken-links-readme Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2859.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2859 commit 154e096fa6b6663f40da20fefca6cf947a394a15 Author: Ryan Williams ryan.blake.willi...@gmail.com Date: 2014-10-19T17:41:33Z fix broken links in README.md seems like building-spark.html was renamed to building-with-maven.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix broken links in README.md
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2859#issuecomment-59782880 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59784360 @davies Your idea of adding something like `fromTupleRDD` to `PythonMLLibAPI` seems to be the way to go. I'm just doing some cleanup and will push `userFeatures` and `productFeatures` in just a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-59785816 @preaudc see last commit, I applied this change to the `case _` as well, per your suggestion! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-59791098 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21918/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Initial time estimator with new column for rem...
Github user devldevelopment commented on the pull request: https://github.com/apache/spark/pull/2837#issuecomment-59791166 Ok thanks for the feedback guys, if this feature is no longer wanted or needed maybe be can close it (the JIRA 576)? Generally I'm getting to grips with scala and spark contribution so wanted a first easy task to implement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-59791090 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21918/consoleFull)** for PR 2615 at commit [`812db5b`](https://github.com/apache/spark/commit/812db5bb3b70c2b20cd1ec1d05f376003e554b41) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-59793960 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-59794008 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-59794882 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21920/consoleFull) for PR 2667 at commit [`d64c120`](https://github.com/apache/spark/commit/d64c1201e439d2894a76196659c59b9abb03be5e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP]SPARK-3957: show broadcast variable resou...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r19096569 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -30,7 +30,8 @@ import org.apache.spark.util.ActorLogReceive private[spark] case class Heartbeat( executorId: String, taskMetrics: Array[(Long, TaskMetrics)], // taskId - TaskMetrics -blockManagerId: BlockManagerId) +blockManagerId: BlockManagerId, +broadcastBlocks: Map[BlockId, Option[BlockStatus]]) --- End diff -- Would this send a BlockStatus for each broadcast variable on a heartbeat ? If we have hundreds or thousands of broadcast variables I wonder if the message size will become huge. Could we send deltas somehow ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix broken links in README.md
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2859#issuecomment-59795665 No, the reverse actually. The site has not been rebuilt though to expose the new page. This has been asked several times so I hope the site can be refreshed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix broken links in README.md
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2859#issuecomment-59796389 I see. Do you want to: * leave the broken links, * add some basic building with sbt commands to the README, or * point them at the building-with-maven page per the change here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-59796463 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21921/consoleFull) for PR 2667 at commit [`be6645e`](https://github.com/apache/spark/commit/be6645eb4a6814f0a8d9983625444630e04e723e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2858#issuecomment-59797312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21919/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2858#issuecomment-59797300 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21919/consoleFull) for PR 2858 at commit [`9866fbf`](https://github.com/apache/spark/commit/9866fbfacf90f319dc1e318077f7d433e1bcb222). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix broken links in README.md
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2859#issuecomment-59797568 The issue is that the docs in `master` are all consistent, and think contain the correct current state of instructions. But on Github you see `master`'s `README.md` but the site is course built from branch 1.1. I think the intent is to move build instructions out of `README.md` more than the reverse since it duplicates the main doc page. It may be too inconvenient to back-port the doc changes to 1.1 and rebuild the site. Maybe an interim solution is to just have both links in `README.md`. Or slip in a redirect page from the new URL to old right now. Or hey it gets fixed in a month or two with 1.2 anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix broken links in README.md
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2859#issuecomment-59799405 Interesting. What do you mean by both links? The current links to `building-spark.html`, as well as the `building-with-maven.html` links I've submitted here? The former currently 404, so keeping them in the README if we are going to the trouble of changing it doesn't make sense to me. I see now that `README.md` is not up-to-date, but that was not at all apparent when I was getting set up with Spark over the weekend :-\ Seems like the README should be kept consistent with the source tree that it is committed with, and that can be decoupled from coarser per-release website refreshes. Could we add a couple commands explaining that `sbt` is blessed now, and showing how to use it? Otherwise maybe the README should just be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59803881 @JoshRosen agreed with @ash211, this is really good. Are there any actual comments on the PR, or can it be merged? =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59803868 LGTM now, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2824 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2824#issuecomment-59805036 I've merged this into `master` and `branch-1.1`. Thanks a lot :). Thank YOU (and @mridulm) for helping to diagnose this really subtle bug! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59805377 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21922/consoleFull) for PR 1658 at commit [`92bda0d`](https://github.com/apache/spark/commit/92bda0daf2fffeea0f1de9199fc71fe978a165c7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59805569 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21922/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59805565 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21922/consoleFull) for PR 1658 at commit [`92bda0d`](https://github.com/apache/spark/commit/92bda0daf2fffeea0f1de9199fc71fe978a165c7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2860 [SPARK-3537][SQL] Refines in-memory columnar table statistics This PR refines in-memory columnar table statistics: 1. adds 3 more statistics for in-memory table columns: `count`, `nullCount` and `sizeInBytes`, and filter pushdown support for `IS NULL` and `IS NOT NULL`. 1. caches and propagates statistics in `InMemoryRelation` once the underlying cached RDD is materialized. Statistics are collected to driver side with an accumulator. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark propagates-in-mem-stats Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2860.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2860 commit 7dc6a34166ad915e07438795ce6b6ea67b3fdee6 Author: Cheng Lian l...@databricks.com Date: 2014-10-20T17:13:59Z Adds more in-memory table statistics and propagates them properly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2860#discussion_r19099520 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala --- @@ -24,11 +24,13 @@ import org.apache.spark.sql.catalyst.expressions.{AttributeMap, Attribute, Attri import org.apache.spark.sql.catalyst.types._ private[sql] class ColumnStatisticsSchema(a: Attribute) extends Serializable { - val upperBound = AttributeReference(a.name + .upperBound, a.dataType, nullable = false)() - val lowerBound = AttributeReference(a.name + .lowerBound, a.dataType, nullable = false)() - val nullCount = AttributeReference(a.name + .nullCount, IntegerType, nullable = false)() + val upperBound = AttributeReference(a.name + .upperBound, a.dataType, nullable = true)() + val lowerBound = AttributeReference(a.name + .lowerBound, a.dataType, nullable = true)() --- End diff -- Upper/lower bound can be null for types like string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2860#discussion_r19099771 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala --- @@ -185,15 +196,16 @@ private[sql] class StringColumnStats extends ColumnStats { } else { nullCount += 1 } +count += 1 +sizeInBytes += STRING.actualSize(row, ordinal) --- End diff -- This can potentially slow down caching process of string columns, because the `.getBytes(utf-8)` call within `actualSize` traverses the whole string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2860#issuecomment-59806948 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21923/consoleFull) for PR 2860 at commit [`7dc6a34`](https://github.com/apache/spark/commit/7dc6a34166ad915e07438795ce6b6ea67b3fdee6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Initial commit to provide pluggable strategy t...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2849#issuecomment-59807087 Hey @olegz is there an associated JIRA for this? If so could you include it in the title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-59807140 Jenkins, test this please. Does that work if I am not an admin? @pwendell agreed, the logic is a little tricky but I couldn't find a simpler way to express it; in the meantime, I factored it out since it was repeated in two `case`s --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3537][SPARK-3914][SQL] Refines in-memor...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2860#discussion_r1919 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -76,4 +76,24 @@ class PlannerSuite extends FunSuite { setConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD, origThreshold.toString) } + + test(InMemoryRelation statistics propagation) { --- End diff -- Test case for SPARK-3914. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-59807368 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...
Github user olegz commented on the pull request: https://github.com/apache/spark/pull/2849#issuecomment-59807570 @andrewor14 done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org