[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49126648 QA results for PR 1313:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16707/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1244#issuecomment-49126716 We only want to do this if the driver shares the same directory structure as the executors. This is an assumption that is incorrect in many deployment settings. Really, we should have something like `spark.executor.home` that is not the same as `SPARK_HOME`. I am not 100% sure if we can just rip this functionality out actually. I am under the impression that Mesos still depends on something like this, so we should double check before we remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-49126877 That's a good point. Changing it for YARN seems like the right thing, and 80% sounds reasonable to me. Another thing is the wait time. Previously it was 6 seconds, but now spark.scheduler.maxRegisteredExecutorsWaitingTime defaults to 5 times that. 30 seconds seems a little excessive to me in general - at least for jobs without caching, after a couple seconds the wait outweighs scheduling some non-local tasks. What do you think about decreasing this to 6 seconds in general? Or at least for YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1431 [SPARK-2517] Removed some compiler type erasure warnings. Also took the chance to rename some variables to avoid unintentional shadowing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark deprecate-warning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1431.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1431 commit 44abdcce6f3409a0fa528252e88aaba5cd615559 Author: Reynold Xin r...@apache.org Date: 2014-07-16T06:03:10Z [SPARK-2517] Removed some compiler type erasure warnings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49127255 QA tests have started for PR 1431. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16716/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1430#issuecomment-49127540 The code looks good to me. I am thinking if we could merge the ConstantFolding NullPropagation and make it `transformExpressionsUp` , since they kind of rely on each other, and people may always get confused when place a Null or Constant optimization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1432 [SPARK-2518][SQL] Fix foldability of Substring expression. This is a follow-up of #1428. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2518 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1432 commit 37d1ace86d909bc4204bbb9f55c56a76aae8c106 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T06:22:17Z Fix foldability of Substring expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1430#issuecomment-49128331 See the discussion at #1428. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2317] Improve task logging.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1259#issuecomment-49128371 QA results for PR 1259:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass TaskRunner(brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16710/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1432#issuecomment-49128436 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1407#issuecomment-49130099 QA results for PR 1407:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16712/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49131064 Besides breaking the API, I'm also worried about two things: 1. The increase in storage. We had some discussion before v1.0 about whether we should switch to long or not. ALS is not computation heavy for small k but communication heavy. I posted some screenshots on the JIRA page, where ALS shuffles ~200GB data in each iteration. With Long ids, this number may become ~300GB and hence ALS may slow down by 50%. Instead of upgrading the id type to Long, I'm actually thinking about downgrading the rating type to Float. 2. Is collision really bad? ALS needs somewhat dense matrix to compute good recommendations. If there are 3 billion users but each user only gives 1 or 2 ratings, ALS is very likely to overfit. In this case, making a random projection on the user side would certainly help, while hashing is one of the commonly used techniques for random projection. There will be bad recommendations no matter whether there exist hash collisions or not. So I'm really interested in some measurements on the downside of hash collision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support
GitHub user cfregly opened a pull request: https://github.com/apache/spark/pull/1434 [SPARK-1981] Add AWS Kinesis streaming support You can merge this pull request into a Git repository by running: $ git pull https://github.com/cfregly/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1434.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1434 commit b3b0ff118cac3c0a5a10f9912b383bb0665c9a1b Author: Chris Fregly ch...@fregly.com Date: 2014-07-16T07:03:04Z [SPARK-1981] Add AWS Kinesis streaming support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-49132892 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1435 SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical... ... aggregation code You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2519 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1435.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1435 commit 640706a19f96fd242e8619188c82e39cb6386fd3 Author: Sandy Ryza sa...@cloudera.com Date: 2014-07-16T07:12:46Z SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical aggregation code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1127 Add spark-hbase.
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/194#issuecomment-49133737 Hi, the referenced PR Spark-1416 includes the following comment by @MLnick: But looking at the HBase PR you referenced, I don't see the value of having that live in Spark. And why is it not simply using an OutputFormat instead of custom config and writing code? (I might be missing something here, but it seems to add complexity and maintenance burden unnecessarily) Patrick: would you mind to tell us whether that comment were going to be affect this PR? We are going to be providing a significant chunk of HBase functionality and would like to know whether to build off of this PR or not. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49133777 QA tests have started for PR 1435. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16719/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49133829 Nice. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49133859 @tgravescs updated according to your comments and rebased to current HEAD of master branch. Thanks for following up on this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49134307 QA results for PR 1431:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16716/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1432#issuecomment-49134887 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add HiveDecimal HiveVarchar support in unwra...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/1436 Add HiveDecimal HiveVarchar support in unwrapping data You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark unwrapdata Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1436.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1436 commit 39d6475d8cf357488fd1ec736b4d910f8237fc5b Author: Cheng Hao hao.ch...@intel.com Date: 2014-07-16T07:59:50Z Add HiveDecimal HiveVarchar support in unwrap data commit afc39da00f53f15edb466768c24cdd73ec5bc119 Author: Cheng Hao hao.ch...@intel.com Date: 2014-07-16T08:21:25Z Polish the code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add HiveDecimal HiveVarchar support in unwra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49135679 QA tests have started for PR 1436. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16720/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Tightening visibility for various Broadcast re...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1438#issuecomment-49137498 QA tests have started for PR 1438. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16723/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] Add HiveDecimal HiveVarchar support in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49137154 QA tests have started for PR 1436. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16722/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2523] [SQL] Hadoop table scan
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/1439 [SPARK-2523] [SQL] Hadoop table scan In HiveTableScan.scala, ObjectInspector was created for all of the partition based records, which probably causes ClassCastException if the object inspector is not identical among table partitions. This is the follow up with: https://github.com/apache/spark/pull/1408 https://github.com/apache/spark/pull/1390 You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark hadoop_table_scan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1439 commit 39d6475d8cf357488fd1ec736b4d910f8237fc5b Author: Cheng Hao hao.ch...@intel.com Date: 2014-07-16T07:59:50Z Add HiveDecimal HiveVarchar support in unwrap data commit afc39da00f53f15edb466768c24cdd73ec5bc119 Author: Cheng Hao hao.ch...@intel.com Date: 2014-07-16T08:21:25Z Polish the code commit d66835b420e22f98e10556783102e7dc356e6e6a Author: Cheng Hao hao.ch...@intel.com Date: 2014-07-16T08:24:30Z Fix Bug in TableScan while Paritition SerDe is not compatiable with each other --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49137261 @dbtsai The assertions with `===` were all tested to work, but I agree it is more robust to allow numerical errors. One downside of this change is that `===` reports the values in comparison when something is wrong but now `almostEquals` only returns true/false. It would be great if we can make the implementation similar to `===`. Btw, Scalatest 2.x has this tolerance feature, where you can use `+-` to indicate a range. We are not using Scalatest 2.x but it is a useful feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2523] [SQL] Hadoop table scan bug fixin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49138252 QA tests have started for PR 1439. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16724/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1408#issuecomment-49138529 @yhuai @concretevitamin @rxin I've create another PR for this follow up, we can discuss this more at: https://github.com/apache/spark/pull/1439 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49138992 `ObjectInspector` is not required by `Row` in Catalyst any more (not like in Shark), and it is tightly coupled with Deserializer the raw data, so I moved the `ObjectInspector` into `TableReader`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1399#discussion_r14987086 --- Diff: sbin/start-thriftserver.sh --- @@ -0,0 +1,24 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Figure out where Spark is installed +FWDIR=$(cd `dirname $0`/..; pwd) + +CLASS=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 +$FWDIR/bin/spark-class $CLASS $@ --- End diff -- Thanks, I've noticed the discussion. Marked this PR as WIP, will update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49141155 QA results for PR 1435:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16719/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r14988046 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -20,8 +20,20 @@ package org.apache.spark.mllib.evaluation import org.scalatest.FunSuite import org.apache.spark.mllib.util.LocalSparkContext +import org.apache.spark.mllib.util.TestingUtils._ class BinaryClassificationMetricsSuite extends FunSuite with LocalSparkContext { + + implicit class SeqDoubleWithAlmostEquals(val x: Seq[Double]) { +def almostEquals(y: Seq[Double], eps: Double = 1E-6): Boolean = --- End diff -- 1.0e-6 is way bigger than an ulp for a double; 1.0e-12 is more like it. I understand a complex calculation might legitimately vary by significantly more than an ulp depending on the implementation. As @mengxr says where you mean to allow significantly more than machine precision worth of noise, that's probably good to do with an explicitly larger epsilon. But this is certainly a good step forward already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL] Add HiveDecimal HiveVarchar support in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49143482 QA results for PR 1436:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16720/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: fix compile error of streaming project
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/153#issuecomment-49144416 Seems harmless as it only makes the return type of the method explicit. I can't see why it would be specific to building with one version of Hadoop though. Maybe it isn't? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/1441 SPARK-2452, create a new valid for each instead of using lineId. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-2452/multi-statement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1441.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1441 commit fb8b5e7d9fc22db0103d578c964dd1c7b1503ee0 Author: Prashant Sharma prash...@apache.org Date: 2014-07-16T10:15:57Z SPARK-2452, create a new valid for each instead of using lineId, because Line ids can be same sometimes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49146776 QA tests have started for PR 1441. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16726/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49147325 QA results for PR 1439:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass HadoopTableReader(@transient attributes: Seq[Attribute], brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16724/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49148347 QA tests have started for PR 1441. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16727/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49149396 Let me close this PR for now. I will fork or wrap as necessary. Keep it in mind, and maybe in a 2.x release this can be revisited. (Matei I ran into more problems with the `Rating` class retrofit anyway.) Yes storage is the downside. Your comments on JIRA about effects of serialization in compressing away the difference are promising. I completely agree with using `Float` for ratings and even feature vectors. Yes I understand why random projections are helpful. It doesn't help accuracy, but may only trivially hurt it in return for some performance gain. If I have just 1 rating, it doesn't make my recs better to arbitrarily add your ratings to mine. Sure that's denser, and maybe you're getting less overfitting, but it's fitting the wrong input for both of us. A collision here and there is probably acceptable. One in a million customers? OK. 1%? maybe a problem. I agree, you'd have to quantify this to decide. If I'm an end user of MLlib bringing even millions of things to my model, I have to decide. And if it's a problem, have to maintain a lookup table to use it. It seemed simplest to moot the problem with a much bigger key space and engineer around the storage issue. A bit more memory is cheap; accuracy and engineer time are expensive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49153027 @mridulm @lirui-intel I separated the noPref tasks and those with the unavailable preferencethis will treat the tasks with the unavailable preference as non-local ones before their preferences become available --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1244#issuecomment-49153988 @andrewor14 yeah, I agree with you, I just thought in somewhere (document in the earlier versions? I cannot find it now), the user has to set this env variable? so I said prioritizing worker side SPARK_HOME, if this is not set, Spark will try to read application setup about SPARK_HOME (which may generates error if the directory structure is not the same) I also noticed this JIRA https://issues.apache.org/jira/browse/SPARK-2454 (left some comments there) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49154451 QA results for PR 1440:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16725/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...
Github user chutium commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-49154812 hi, it is really very useful for us, i tried this implementation from @willb , in spark-shell, i still got java.lang.UnsupportedOperationException by Query Plan, i made some change in SqlParser: https://github.com/chutium/spark/commit/1de83a7560f85cd347bca6dde256d551da63a144 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added Parse of SQL SUBSTR()
GitHub user chutium opened a pull request: https://github.com/apache/spark/pull/1442 SPARK-2407: Added Parse of SQL SUBSTR() follow-up of #1359 You can merge this pull request into a Git repository by running: $ git pull https://github.com/chutium/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1442.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1442 commit 1de83a7560f85cd347bca6dde256d551da63a144 Author: chutium teng@gmail.com Date: 2014-07-16T11:44:09Z SPARK-2407: Added Parse of SQL SUBSTR() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49155058 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-49155240 Awesome! Please submit a pull request with that addition. On Jul 16, 2014 7:53 AM, Teng Qiu notificati...@github.com wrote: hi, it is really very useful for us, i tried this implementation from @willb https://github.com/willb , in spark-shell, i still got java.lang.UnsupportedOperationException by Query Plan, i made some change in SqlParser: chutium@1de83a7 https://github.com/chutium/spark/commit/1de83a7560f85cd347bca6dde256d551da63a144 â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/1359#issuecomment-49154812. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49155576 Hmm, just realized `Timestamp.toString` normalizes date and time according to current timezone and makes almost all timestamp related tests timezone sensitive. (Wouldn't notice this if I were in US...) Guess we have to blacklist them for now, and this will revert part of #1396. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49155771 QA results for PR 1441:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16726/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r14993202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -249,3 +263,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { if (evaluated == null) null else cast(evaluated) } } + +object Cast { + private[sql] val simpleDateFormat = new SimpleDateFormat(-MM-dd HH:mm:ss) --- End diff -- Hi, `SimpleDateFormat` is not thread-safe, so `def` should be used instead of `val`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49156553 Confirmed that the following test cases are timezone sensitive and blacklisted them (by first remove all timestamp related golden answers, run them in my local timezone to generate new golden answers, then manually change my timezone settings and rerun these tests): - `timestamp_1` - `timestamp_2` - `timestamp_3` * - `timestamp_udf` * [*] Reverted from #1396. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49156902 QA results for PR 1441:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16727/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r14993659 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -249,3 +263,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { if (evaluated == null) null else cast(evaluated) } } + +object Cast { + private[sql] val simpleDateFormat = new SimpleDateFormat(-MM-dd HH:mm:ss) --- End diff -- Just checked `Timestamp.java`, it's indeed handled with a thread local variable. Thanks for pointing this out! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: discarded exceeded completedDrivers
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-49157266 a document for newly introduced spark.deploy.retainedDrivers is missing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49157381 QA tests have started for PR 1440. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16728/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49158865 QA tests have started for PR 1440. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16729/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...
Github user chutium commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-49159677 PR submitted #1442 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49159849 That was my thought as well, @egraldlo. Thanks for submitting this, @chutium! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()
Github user egraldlo commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49160379 thx @willb, maybe protected val SUBSTRING = Keyword(SUBSTRING) as well, but this will cause the code redundance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49160982 @egraldlo, couldn't it be `(SUBSTR | SUBSTRING) ~ // ... ` in that case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()
Github user egraldlo commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49161743 fine, that's great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49168910 QA results for PR 1440:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16728/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49171137 QA results for PR 1440:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16729/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: discarded exceeded completedDrivers
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-49173609 @CodingCat thanks,i have created a jire issue https://issues.apache.org/jira/browse/SPARK-2524 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49174657 Hmmm... not sure that I would go so far as to call it nice. This does make the code slightly more difficult to read and understand, so can we hope that you've got some relative performance numbers that justify this compromise, @sryza ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49175346 Could you elaborate on when we will see an exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1443 [SPARK-2524] missing document about spark.deploy.retainedDrivers https://issues.apache.org/jira/browse/SPARK-2524 The configuration on spark.deploy.retainedDrivers is undocumented but actually used https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L60 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lianhuiwang/spark SPARK-2524 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1443.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1443 commit f2b597022b4fc4023c238e5b5a9824946f84f84e Author: lianhuiwang lianhuiwan...@gmail.com Date: 2014-05-23T14:02:57Z bugfix worker DriverStateChanged state should match DriverState.FAILED commit 480ce949a83c0d854078b38f5665f3369cf759eb Author: lianhuiwang lianhuiwan...@gmail.com Date: 2014-05-24T15:24:37Z address aarondav comments commit 8bbfe76dd8c8af815fa8404eb9a7922e58f938f7 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2014-06-10T16:01:36Z Merge remote-tracking branch 'upstream/master' commit eacf9339a8c062cf3f28343a4f8157d214d25b00 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2014-07-13T14:13:03Z Merge remote-tracking branch 'upstream/master' commit 44a3f50c689849228c42d072bdd355781dbacec6 Author: unknown administra...@taguswang-pc1.tencent.com Date: 2014-07-16T14:22:18Z Merge remote-tracking branch 'upstream/master' commit 5f6bbb7119ecd188af4967ac15f3ff1986ad400d Author: Wang Lianhui lianhuiwan...@gmail.com Date: 2014-07-16T14:40:03Z missing document about spark.deploy.retainedDrivers The configuration on spark.deploy.retainedDrivers is undocumented but actually used https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L60 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1443#issuecomment-49176249 QA tests have started for PR 1443. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16730/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r15001961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala --- @@ -344,21 +344,52 @@ private[sql] class StringColumnStats extends BasicColumnStats(STRING) { } override def contains(row: Row, ordinal: Int) = { -!(upperBound eq null) { +(upperBound ne null) { --- End diff -- Nit: Spark style would probably prefer != here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r15002400 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -93,6 +93,10 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { partitions_json, // Timezone specific test answers. +timestamp_1, --- End diff -- Is there someway we could fix the timezone in the test harness instead of turning all of these off? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49184593 Another example of this problem is the PageRank example bundled in Spark. At this time, since the problem of Java serializer still exists, to avoid causing StackOverflowError after too many iterations, it is needed to call `checkpoint()` on the RDD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49188774 You are right that this rule does more than null propagation now. I'm not sure what a better name would be. `DegenerateExpressionSimplification`? Regarding moving null propagation into the expressions, you could do it... but what would it look like? You specify which of the children make the entire expression null if they are null? Seems like a lot of refactoring for little benefit... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1443#issuecomment-49191235 QA results for PR 1443:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16730/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2033] Automatically cleanup checkpoint
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/855#issuecomment-49192571 QA tests have started for PR 855. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16731/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49193038 How about we close this one and merge #1444? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2525][SQL] Remove as many compilation w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1444#issuecomment-49193205 QA tests have started for PR 1444. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16732/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve ALS algorithm resource usage
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-49193225 QA tests have started for PR 929. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16733/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2119][SQL] Improved Parquet performance...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1370#issuecomment-49193554 Thanks! I've merged this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2119][SQL] Improved Parquet performance...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1370 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49194701 I'm going off of @mateiz 's report on SPARK-2048 that we found [this] to be much slower than accessing fields directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49195389 Got it. Thanks. That also helps to put some bound (for now) on where we will make such performance optimizations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/1022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...
GitHub user witgo reopened a pull request: https://github.com/apache/spark/pull/1022 SPARK-1719: spark.*.extraLibraryPath isn't applied on yarn Fix: spark.executor.extraLibraryPath isn't applied on yarn You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1719 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1022.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1022 commit b23e9c3e4085c0a7faf2c51fd350ad1233aa7a40 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-07-11T18:52:35Z [SPARK-2437] Rename MAVEN_PROFILES to SBT_MAVEN_PROFILES and add SBT_MAVEN_PROPERTIES NOTE: It is not possible to use both env variable `SBT_MAVEN_PROFILES` and `-P` flag at same time. `-P` if specified takes precedence. Author: Prashant Sharma prashan...@imaginea.com Closes #1374 from ScrapCodes/SPARK-2437/rename-MAVEN_PROFILES and squashes the following commits: 8694bde [Prashant Sharma] [SPARK-2437] Rename MAVEN_PROFILES to SBT_MAVEN_PROFILES and add SBT_MAVEN_PROPERTIES commit cbff18774b0a2f346901ddf2f566be50561a57c7 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-07-12T04:10:26Z [SPARK-2457] Inconsistent description in README about build option Now, we should use -Pyarn instead of SPARK_YARN when building but README says as follows. For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, also set `SPARK_YARN=true`: # Apache Hadoop 2.0.5-alpha $ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly # Cloudera CDH 4.2.0 with MapReduce v2 $ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly # Apache Hadoop 2.2.X and newer $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #1382 from sarutak/SPARK-2457 and squashes the following commits: e7b2d64 [Kousuke Saruta] Replaced SPARK_YARN=true with -Pyarn in README commit 55960869358d4f8aa5b2e3b17d87b0b02ba9acdd Author: DB Tsai dbt...@dbtsai.com Date: 2014-07-12T06:04:43Z [SPARK-1969][MLlib] Online summarizer APIs for mean, variance, min, and max It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests. Changes: 1) Moved the private implementation from org.apache.spark.mllib.linalg.ColumnStatisticsAggregator to org.apache.spark.mllib.stat.MultivariateOnlineSummarizer 2) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample. 3) Added the APIs documentation for MultivariateOnlineSummarizer. 4) Added the unittests for MultivariateOnlineSummarizer. Author: DB Tsai dbt...@dbtsai.com Closes #955 from dbtsai/dbtsai-summarizer and squashes the following commits: b13ac90 [DB Tsai] dbtsai-summarizer commit d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22 Author: Li Pu l...@twitter.com Date: 2014-07-12T06:26:47Z use specialized axpy in RowMatrix for SVD After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers. Author: Li Pu l...@twitter.com Author: Xiangrui Meng m...@databricks.com Author: Li Pu li...@outlook.com Closes #1378 from vrilleup/master and squashes the following commits: 6fb01a3 [Li Pu] use specialized axpy in RowMatrix 5255f2a [Li Pu] Merge remote-tracking branch 'upstream/master' 7312ec1 [Li Pu] very minor comment fix 4c618e9 [Li Pu] Merge pull request #1 from mengxr/vrilleup-master a461082 [Xiangrui Meng] make superscript show up correctly in doc 861ec48 [Xiangrui Meng] simplify axpy 62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs c273771 [Li Pu] automatically determine SVD compute mode and parameters 7148426 [Li Pu] improve RowMatrix multiply 5543cce [Li Pu] improve svd api 819824b [Li Pu] add flag for dense svd or sparse svd eb15100 [Li Pu] fix binary compatibility 4c7aec3 [Li Pu] improve comments e7850ed [Li Pu] use aggregate and axpy 827411b [Li Pu] fix EOF new line 9c80515 [Li Pu] use non-sparse implementation
[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1022#issuecomment-49196995 QA tests have started for PR 1022. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16735/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15013544 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala --- @@ -81,9 +82,8 @@ class LogisticRegressionSuite extends FunSuite with LocalSparkContext with Match val model = lr.run(testRDD) // Test the weights -val weight0 = model.weights(0) -assert(weight0 = -1.60 weight0 = -1.40, weight0 + not in [-1.6, -1.4]) -assert(model.intercept = 1.9 model.intercept = 2.1, model.intercept + not in [1.9, 2.1]) +assert(model.weights(0).almostEquals(-1.5244128696247), weight0 should be -1.5244128696247) --- End diff -- We can have higher relative error here instead. If the implementation is changed, it's also nice to have a test which can catch the slightly different behavior. Also, updating those numbers will not take too much time comparing with the implementation work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15013611 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -241,4 +252,37 @@ private[hive] object HadoopTableReader { val bufferSize = System.getProperty(spark.buffer.size, 65536) jobConf.set(io.file.buffer.size, bufferSize) } + + /** + * Transform the raw data(Writable object) into the Row object for an iterable input + * @param iter Iterable input which represented as Writable object + * @param deserializer Deserializer associated with the input writable object + * @param attrs Represents the row attribute names and its zero-based position in the MutableRow + * @param row reusable MutableRow object + * + * @return Iterable Row object that transformed from the given iterable input. + */ + def fillObject(iter: Iterator[Writable], deserializer: Deserializer, + attrs: Seq[(Attribute, Int)], row: GenericMutableRow): Iterator[Row] = { +val soi = deserializer.getObjectInspector().asInstanceOf[StructObjectInspector] +// get the field references according to the attributes(output of the reader) required +val fieldRefs = attrs.map { case (attr, idx) = (soi.getStructFieldRef(attr.name), idx) } + +// Map each tuple to a row object +iter.map { value = + val raw = deserializer.deserialize(value) + var idx = 0; + while(idx fieldRefs.length) { --- End diff -- nit: space after while --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2098: All Spark processes should support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1256#issuecomment-49197007 QA tests have started for PR 1256. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16734/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15013786 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -20,8 +20,20 @@ package org.apache.spark.mllib.evaluation import org.scalatest.FunSuite import org.apache.spark.mllib.util.LocalSparkContext +import org.apache.spark.mllib.util.TestingUtils._ class BinaryClassificationMetricsSuite extends FunSuite with LocalSparkContext { + + implicit class SeqDoubleWithAlmostEquals(val x: Seq[Double]) { +def almostEquals(y: Seq[Double], eps: Double = 1E-6): Boolean = --- End diff -- Yeah, for one ulp, it might be 10e-15. Lots of time, I manually type the numbers or just copy the first couple dights of numbers to save the line space, so that's why I chose 1.0e-6. Thus, I can just type around 7 digits of numbers. I agree with you that in this case, we may want to explicitly specify with larger epsilon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1212 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/1338#issuecomment-49198901 Great - I will review in more detail after that. Would be great to get this merged before 1.1 freeze so PySpark I/O for inputformat and outputformat is in for the next release! On Tue, Jul 15, 2014 at 1:07 AM, kanzhang notificati...@github.com wrote: @MLnick https://github.com/MLnick I'll see if I can add couple output converter examples as well. Thx. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/1338#issuecomment-48971710. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1022#issuecomment-49199676 QA tests have started for PR 1022. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16737/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49200080 I just noticed that pendingTasksWithNotReadyPrefs is not being used now ? It is getting updated but never actually queried from ... Do we need to maintain it ? The way I initially thought about this problem was, 1) When a task has no preferred location by definition : schedule it on any node when there are no NODE_LOCAL tasks available for that executor. 2) When a task has preferred location defined, but none available right now, treat is as ANY task : so that other PROCESS/NODE/RACK local tasks have precedence over it. If/when a node/rack local host pops in, it becomes eligible for better schedule preference. @CodingCat, @kayousterhout @lirui-intel any thoughts ? I might be missing somethere here ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49200501 Thanks @lirui-intel merged finally :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49200595 @mridulm this is exactly what the PR is doing here? no? yes, it seems that pendingTasksWithNotReadyPrefs is redundant --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49201615 If pendingTasksWithNotReady is never used, why was it added? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Tightening visibility for various Broadcast re...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1438#issuecomment-4920 Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49201718 oh, just a mistake, I'm removing it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2526: Simplify options in make-distribut...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/1445 SPARK-2526: Simplify options in make-distribution.sh Right now we have a bunch of parallel logic in make-distribution.sh that's just extra work to maintain. We should just pass through Maven profiles in this case and keep the script simple. See the JIRA for more details. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark make-distribution.sh Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1445.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1445 commit f1294ea1f1af2479f15d471dcb7bccd29be6169a Author: Patrick Wendell pwend...@gmail.com Date: 2014-07-13T20:28:19Z Simplify options in make-distribution.sh. Right now we have a bunch of parallel logic in make-distribution.sh that's just extra work to maintain. We should just pass through Maven profiles in this case and keep the script simple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49201924 Sean, I'd still be okay with adding a LongALS class if you see benefit for it in some use cases. Let's just see how it works in comparison. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15015383 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -351,6 +354,14 @@ private[spark] class TaskSetManager( for (index - findTaskFromList(execId, getPendingTasksForHost(host))) { return Some((index, TaskLocality.NODE_LOCAL, false)) } + // Look for no-pref tasks after rack-local tasks since they can run anywhere. --- End diff -- This comment is no longer correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---