date:20141020

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59686863
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21906/consoleFull)
 for   PR 2824 at commit 
[`be0533a`](https://github.com/apache/spark/commit/be0533a88f6b624629ac66cfeb9989337c002cfd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-20 Thread manishamde

Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/2607#discussion_r19069043
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LeastSquaresError.scala 
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.tree.loss
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.model.DecisionTreeModel
+
+/**
+ * Class for least squares error loss calculation.
+ */
+object LeastSquaresError extends Loss {
--- End diff --

I am making an attempt to add a mathematical statement. Let me know if we 
need to be more descriptive. I plan to be more formal in the actual 
documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...

2014-10-20 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/2852

[WIP] Add WebUITableBuilder to simplify table-building code

This work-in-progress commit illustrates a weekend hack project that I came 
up with for significantly simplifying the web UI's table rendering code.  See 
the huge block comment in `WebUITableBuilder` for more details.  This isn't 
ready to merge yet; I wanted to get some feedback before converting the rest of 
the table construction code to use this (I know that I should open a JIRA for 
this, too; I'll do it tomorrow).

Essentially, this commit adds a small builder class for constructing 
objects that know how to render web UI tables.  This builder helps us to avoid 
several sources of errors / maintenance headaches, such as 
duplicate/boilerplate markup, inconsistent formatting of columns in different 
tables (e.g. durations or memory being displayed differently), separation of 
column names from column data values, etc.  This is best illustrated by some 
sample code; this new framework lets you write

```scala
  private val appTable: UITable[ApplicationInfo] = {
val builder = new UITableBuilder[ApplicationInfo]()
import builder._
customCol(ID) { app =
  a href={app?appId= + app.id}{app.id}/a
}
col(Name) { _.id }
intCol(Cores) { _.coresGranted }
memCol(Memory per Node) { _.desc.memoryPerSlave }
dateCol(Submitted Time) { _.submitDate }
col(User) { _.desc.user }
col(State) { _.state.toString }
durationCol(Duration) { _.duration }
build
  }
```

to render the applications table in the standalone Master UI.  I find 
this significantly easier to understand and maintain than the old code.  For 
example, this makes it trivial to re-order columns.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark webui-table-builder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2852


commit c0aca09d676ce750496451f3691c5f9e861103bd
Author: Josh Rosen joshro...@databricks.com
Date:   2014-10-20T06:02:29Z

Add WebUITableBuilder to clean up table building code.

This significantly simplifies / abstracts the web UI's table construction
code, which seems to account for the majority of the UI code.  I haven't
converted all tables to use this yet; this commit just provides the basic
framework and a few example usages in the master web UI.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2852#issuecomment-59687664
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21907/consoleFull)
 for   PR 2852 at commit 
[`c0aca09`](https://github.com/apache/spark/commit/c0aca09d676ce750496451f3691c5f9e861103bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2852#issuecomment-59687734
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21907/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Add WebUITableBuilder to simplify table-...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2852#issuecomment-59687732
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21907/consoleFull)
 for   PR 2852 at commit 
[`c0aca09`](https://github.com/apache/spark/commit/c0aca09d676ce750496451f3691c5f9e861103bd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59688035
  
Hi @JoshRosen , I just set `transferToEnabled` to false as default value, 
unless users explicitly set it to true, `transferTo` will not be enabled. 

Currently, only `ExternalSorter` use this API as file to file copying and 
this is controlled by configuration `spark.file.transferTo`, other uses of 
`copyStream` in Spark code are all not file to file copying, so this parameter 
will not take effect.

If future uses of `copyStream`, user have to get `transferToEnabled` from 
configuration, I add some usage notes here.  Still user can bypass 
`spark.file.transferTo` and directly set this parameter to true, but they have 
to be responsible for the correctness of usage.

The reason I didn't take `SparkConf` as a parameter to control the behavior 
is that it should modify lots of the current codes to get `SparkConf` in which 
it calls `copyStream`.

So what is your opinion? Thanks a lot.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59688343
  
HI @jerryshao,

Changing the default is exactly what I had in mind.  This looks good to me! 
 (Going to bed now; I'll merge this tomorrow and backport to `branch-1.1`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59688531
  
Thanks a lot :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...

2014-10-20 Thread baishuo

Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2842#issuecomment-59688527
  
@JoshRosen @pwendell   I  know the reason of this problemã In idea, I 
should right click the project and click maven-reimport


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...

2014-10-20 Thread baishuo

Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/2842


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2839#issuecomment-59688582
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21905/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2839#issuecomment-59688580
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21905/consoleFull)**
 for PR 2839 at commit 
[`d6fdb2a`](https://github.com/apache/spark/commit/d6fdb2a40d8fbdcfadf3b27bc82e0bbdbdc808fe)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r19069469
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
+ */
+@Experimental
+class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], 
Array[T])]) {
+
+  /**
+   * Compute the average precision of all the queries, truncated at 
ranking position k.
+   * If for a query, the ranking algorithm returns n (n  k) results,
+   * the precision value will be computed as #(relevant items retrived) / 
k.
+   * See the following paper for detail:
+   *
+   * IR evaluation methods for retrieving highly relevant documents.
+   *K. Jarvelin and J. Kekalainen
+   *
+   * @param k the position to compute the truncated precision
+   * @return the average precision at the first k ranking positions
+   */
+  def precisionAt(k: Int): Double = predictionAndLabels.map { case (pred, 
lab) =
+val labSet = lab.toSet
+val n = math.min(pred.length, k)
+var i = 0
+var cnt = 0
+
+while (i  n) {
+  if (labSet.contains(pred(i))) {
+cnt += 1
+  }
+  i += 1
+}
+cnt.toDouble / k
+  }.mean
+
+  /**
+   * Returns the mean average precision (MAP) of all the queries
+   */
+  lazy val meanAveragePrecision: Double = predictionAndLabels.map { case 
(pred, lab) =
+val labSet = lab.toSet
+var i = 0
+var cnt = 0
+var precSum = 0.0
+val n = pred.length
+
+while (i  n) {
+  if (labSet.contains(pred(i))) {
+cnt += 1
+precSum += cnt.toDouble / (i + 1)
+  }
+  i += 1
+}
+precSum / labSet.size
+  }.mean
+
+  /**
+   * Compute the average NDCG value of all the queries, truncated at 
ranking position k.
+   * If for a query, the ranking algorithm returns n (n  k) results, the 
NDCG value at
+   * at position n will be used. See the following paper for detail:
+   *
+   * IR evaluation methods for retrieving highly relevant documents.
+   *K. Jarvelin and J. Kekalainen
+   *
+   * @param k the position to compute the truncated ndcg
+   * @return the average ndcg at the first k ranking positions
+   */
+  def ndcgAt(k: Int): Double = predictionAndLabels.map { case (pred, lab) 
=
+val labSet = lab.toSet
+val labSetSize = labSet.size
+val n = math.min(math.max(pred.length, labSetSize), k)
+var maxDcg = 0.0
+var dcg = 0.0
+var i = 0
+
+while (i  n) {
+  // Calculate 1/log2(i + 2)
+  val gain = math.log(2) / math.log(i + 2)
--- End diff --

`math.log(2)` is by definition but not necessary because of the 
normalization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2839#issuecomment-59682641
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21905/consoleFull)
 for   PR 2839 at commit 
[`d6fdb2a`](https://github.com/apache/spark/commit/d6fdb2a40d8fbdcfadf3b27bc82e0bbdbdc808fe).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2743#issuecomment-59681217
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21900/consoleFull)
 for   PR 2743 at commit 
[`329a30d`](https://github.com/apache/spark/commit/329a30debca49f0b4329944bc5ad152dd218689f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2607#issuecomment-59690664
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21908/consoleFull)
 for   PR 2607 at commit 
[`2ae97b7`](https://github.com/apache/spark/commit/2ae97b74ccc0e7fc3f34d435264768a1403a7a0c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59691398
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21906/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59691389
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21906/consoleFull)
 for   PR 2824 at commit 
[`be0533a`](https://github.com/apache/spark/commit/be0533a88f6b624629ac66cfeb9989337c002cfd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...

2014-10-20 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/2853

[SPARK-4005][CORE] handle message replies in receive instead of in the 
individual private methods

In BlockManagermasterActor, when handling message type UpdateBlockInfo, the 
message replies is in handled in individual private methods, should handle it 
in receive of Akka.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark akkaRecv

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2853.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2853


commit d4b929b49b7962131e514783ab1ca1024244b48e
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-10-20T07:30:46Z

[SPARK-4005][CORE] handle message replies in receive instead of in the 
individual private methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2853#issuecomment-59697011
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21909/consoleFull)
 for   PR 2853 at commit 
[`d4b929b`](https://github.com/apache/spark/commit/d4b929b49b7962131e514783ab1ca1024244b48e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2607#issuecomment-59697993
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21908/consoleFull)
 for   PR 2607 at commit 
[`2ae97b7`](https://github.com/apache/spark/commit/2ae97b74ccc0e7fc3f34d435264768a1403a7a0c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class GradientBoosting (`
  * `case class BoostingStrategy(`
  * `trait Loss extends Serializable `
  * `class GradientBoostingModel(trees: Array[DecisionTreeModel], strategy: 
BoostingStrategy)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2607#issuecomment-59697998
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21908/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2853#issuecomment-59706895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21909/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-20 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-59706916
  
test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2853#issuecomment-59706889
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21909/consoleFull)
 for   PR 2853 at commit 
[`d4b929b`](https://github.com/apache/spark/commit/d4b929b49b7962131e514783ab1ca1024244b48e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-59707432
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21911/consoleFull)
 for   PR 1567 at commit 
[`88b939e`](https://github.com/apache/spark/commit/88b939e3deb15f4ed16a727b33af879fa103c913).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-59708199
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21910/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-59714668
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21911/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-59714664
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21911/consoleFull)
 for   PR 1567 at commit 
[`88b939e`](https://github.com/apache/spark/commit/88b939e3deb15f4ed16a727b33af879fa103c913).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class GroupingSet(bitmasks: Seq[Int], `
  * `case class Cube(groupByExprs: Seq[Expression],`
  * `case class Rollup(groupByExprs: Seq[Expression],`
  * `case class VirtualColumn(name: String, dataType: DataType = 
StringType, nullable: Boolean = false)`
  * `case class GroupingSetExpansion(`
  * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Block Manager - Double Register Crash

2014-10-20 Thread tsliwowicz

GitHub user tsliwowicz opened a pull request:

https://github.com/apache/spark/pull/2854

Block Manager - Double Register Crash

   In long running contexts, we encountered the situation of double 
register without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a 
different port, blockManagerInfo.contains() returns false, while 
blockManagerIdByExecutor returns Some. This inconsistency is caught in a 
conditional statement that does System.exit(1), which is a huge robustness 
issue for us.

The fix - simply remove the old id from both maps during register when 
this happens. We are mimicking the behavior of expireDeadHosts(), by doing 
local cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

https://issues.apache.org/jira/browse/SPARK-4006



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/taboola/spark branch-0.9.2-block-mgr-removal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2854.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2854


commit efd93f2026ddc427e84fa03e8a595ded2b1a81ce
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-12T08:35:20Z

In long running contexts, we encountered the situation of double register 
without a remove in between. The cause for that is unknown, and assumed a temp 
network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

commit 81d69f088e421b19e47495d06e8b187a0ec29075
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-12T08:41:53Z

fixed comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] Ensure that files are fetched ato...

2014-10-20 Thread preaudc

GitHub user preaudc opened a pull request:

https://github.com/apache/spark/pull/2855

[SPARK-3967] Ensure that files are fetched atomically

tempFile is created in the same directory than targetFile, so that the
move from tempFile to targetFile is always atomic

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/preaudc/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2855


commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd
Author: Christophe PrÃ©aud christophe.pre...@kelkoo.com
Date:   2014-10-20T09:58:56Z

Ensure that files are fetched atomically

tempFile is created in the same directory than targetFile, so that the
move from tempFile to targetFile is always atomic




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Block Manager - Double Register Crash

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-59722949
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] Ensure that files are fetched ato...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2855#issuecomment-59722947
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-59723719
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21912/consoleFull)
 for   PR 2342 at commit 
[`35fb0f6`](https://github.com/apache/spark/commit/35fb0f67be7f2f7223e010eca300a7c1ad295c18).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-59723812
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21912/consoleFull)
 for   PR 2342 at commit 
[`35fb0f6`](https://github.com/apache/spark/commit/35fb0f67be7f2f7223e010eca300a7c1ad295c18).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-59723813
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21912/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59724233
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21913/consoleFull)
 for   PR 2816 at commit 
[`a580dd4`](https://github.com/apache/spark/commit/a580dd436d007265ed6cdba9666f9d27e3025f57).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-59724241
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21914/consoleFull)
 for   PR 2342 at commit 
[`afffb05`](https://github.com/apache/spark/commit/afffb05ba1b83564c875ac3bb2aad64339991587).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-59724343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21914/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-59724340
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21914/consoleFull)
 for   PR 2342 at commit 
[`afffb05`](https://github.com/apache/spark/commit/afffb05ba1b83564c875ac3bb2aad64339991587).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-10-20 Thread tsliwowicz

Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/1358#issuecomment-59724452
  
@mateiz - @KashiErez and I went on a different route. The killer issue was 
that there is a System.exit(1) in BlockManagerMasterActor which was a huge 
robustness issue for us. @taboola we are running some pretty large clusters 
(process many tera bytes of data / day) which do real time calculations and are 
mission critical. So - we fixed the issue and it's been running successfully in 
our production for a while now. 

I opened a new ticket - https://issues.apache.org/jira/browse/SPARK-4006
And a pull request - https://github.com/apache/spark/pull/2854

What do you think about our fix? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-20 Thread saucam

Github user saucam commented on the pull request:

https://github.com/apache/spark/pull/2841#issuecomment-59726357
  
This PR also fixes :

https://issues.apache.org/jira/browse/SPARK-1847


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59732528
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21913/consoleFull)
 for   PR 2816 at commit 
[`a580dd4`](https://github.com/apache/spark/commit/a580dd436d007265ed6cdba9666f9d27e3025f57).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2816#issuecomment-59732530
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21913/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...

2014-10-20 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/2856

[SPARK-4008] Fix kryo with fold in KryoSerializerSuite

`zeroValue` will be serialized by `spark.closure.serializer` but 
`spark.closure.serializer` only supports the default Java serializer. So it 
must not be `ClassWithoutNoArgConstructor`, which can not be serialized by the 
Java serializer. 

This PR changed `zeroValue` to null and updated the test to make it work 
correctly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-4008

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2856


commit 51da6558754f34097d4aa9ee8b15fa04ae01b9bf
Author: zsxwing zsxw...@gmail.com
Date:   2014-10-20T11:35:12Z

[SPARK-4008] Fix kryo with fold in KryoSerializerSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2856#issuecomment-59734858
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21915/consoleFull)
 for   PR 2856 at commit 
[`51da655`](https://github.com/apache/spark/commit/51da6558754f34097d4aa9ee8b15fa04ae01b9bf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-20 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-59736352
  
This is the gist of dependency tree for artifacts published by this patch. 
https://gist.github.com/ScrapCodes/a5857e57d828b4b787ff 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2856#issuecomment-59745195
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21915/consoleFull)
 for   PR 2856 at commit 
[`51da655`](https://github.com/apache/spark/commit/51da6558754f34097d4aa9ee8b15fa04ae01b9bf).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4008] Fix kryo with fold in KryoSeria...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2856#issuecomment-59745203
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21915/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread YanTangZhai

GitHub user YanTangZhai opened a pull request:

https://github.com/apache/spark/pull/2857

[SPARK-4009][SQL]HiveTableScan should use makeRDDForTable instead of 
makeRDDForPartitionedTable for partitioned table when partitionPruningPred is 
None

HiveTableScan should use makeRDDForTable instead of 
makeRDDForPartitionedTable for partitioned table when partitionPruningPred is 
None.
If a table has many partitions for example more than 20 thousands while it 
has a few data for example less than 512MB, some sql querying the table will 
produce more than 2 RDDs. The job would submit failed with exception: java 
stack overflow.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanTangZhai/spark SPARK-4009

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2857


commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-06T13:07:08Z

Merge pull request #1 from apache/master

update

commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-20T13:14:08Z

Merge pull request #3 from apache/master

Update

commit 8a0010691b669495b4c327cf83124cabb7da1405
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-12T06:54:58Z

Merge pull request #6 from apache/master

Update

commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-16T12:03:22Z

Merge pull request #7 from apache/master

Update

commit 76d40277d51f709247df1d3734093bf2c047737d
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-10-20T12:52:22Z

Merge pull request #8 from apache/master

update

commit be7882ce16911d018571fa46c1a175d063bdfd03
Author: yantangzhai tyz0...@163.com
Date:   2014-10-20T13:05:44Z

[SPARK-4009][SQL]HiveTableScan should use makeRDDForTable instead of 
makeRDDForPartitionedTable for partitioned table when partitionPruningPred is 
None




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2857#issuecomment-59751379
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21916/consoleFull)
 for   PR 2857 at commit 
[`be7882c`](https://github.com/apache/spark/commit/be7882ce16911d018571fa46c1a175d063bdfd03).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2857#issuecomment-59751519
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21916/consoleFull)
 for   PR 2857 at commit 
[`be7882c`](https://github.com/apache/spark/commit/be7882ce16911d018571fa46c1a175d063bdfd03).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2857#issuecomment-59751523
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21916/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2857#issuecomment-59755633
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21917/consoleFull)
 for   PR 2857 at commit 
[`db0ce73`](https://github.com/apache/spark/commit/db0ce732e51d5813609f80722c20147b7c33bd23).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3875] Add TEMP DIRECTORY configuration

2014-10-20 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2729#issuecomment-59756273
  
Yes, as @mridulm pointed out. This should not be settable by the users on 
yarn.  It should automatically use the yarn approved directories. We have logic 
in there for setting the java.io.tmpdir in ClientBase.  If this is added we 
would need to do something similar and not let the user override it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2857#issuecomment-59758705
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21917/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4009][SQL]HiveTableScan should use make...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2857#issuecomment-59758697
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21917/consoleFull)
 for   PR 2857 at commit 
[`db0ce73`](https://github.com/apache/spark/commit/db0ce732e51d5813609f80722c20147b7c33bd23).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], 
converter: S = T)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-20 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-59761949
  
Hey @pwendell, I have updated this patch to include effective pom changes. 
So that you can try it out. Also I think this is ready for review !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-20 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2615#discussion_r19086365
  
--- Diff: core/pom.xml ---
@@ -264,6 +284,10 @@
   scopetest/scope
 /dependency
 dependency
+  groupIdcom.twitter/groupId
+  artifactIdchill-java/artifactId
+/dependency
--- End diff --

Note to self: remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-59763811
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21918/consoleFull)
 for   PR 2615 at commit 
[`812db5b`](https://github.com/apache/spark/commit/812db5bb3b70c2b20cd1ec1d05f376003e554b41).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-20 Thread mdagost

Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59769268
  
@MLnick It doesn't look like `pairRDDToPython` does the trick.  I tried

```{python}
def userFeatures(self):
juf = self._java_model.userFeatures()   

 
juf = sc._jvm.SerDeUtil.pairRDDToPython(juf, 1)
return juf
```

but what comes out when I try to print the result of taking the first 
element of the RDD is just [[B@176fa1a5 rather than any kind of nicely 
formatted python object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3657] yarn alpha YarnRMClientImpl throw...

2014-10-20 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2728#issuecomment-59771521
  
So one issue is that the scheme was added to properly handle when yarn 
using https (SPARK-3286).  If client mode isn't passing the scheme then that is 
probably broken.  If it was passing the scheme that you wouldn't hit this 
issue. I think changing the YarnClientSchedulerBackend.start routine where it 
sets the spark.driver.appUIAddress would be the equivalent change.  And then we 
would need to test.

With the above change it would have the scheme included and wouldn't hit 
the null.  If we want to add the check in anyway for handling the case where it 
is null just in case something else comes up, thats fine, but I'm not real fond 
of pattern matching here.  How about just checking the URI.getScheme and if 
null we pass it in as is, otherwise we do the getAuthority()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3925][SQL] Do not consider the ordering...

2014-10-20 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/2783#issuecomment-59772318
  
Other comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...

2014-10-20 Thread witgo

GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/2858

[SPARK-4010][Web UI]Spark UI returns 500 in yarn-client mode

The problem caused by #1966 
CC @YanTangZhai @andrewor14

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-4010

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2858.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2858


commit 9866fbfacf90f319dc1e318077f7d433e1bcb222
Author: GuoQiang Li wi...@qq.com
Date:   2014-10-20T15:04:09Z

Spark UI returns 500 in yarn-client mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2858#issuecomment-59781548
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21919/consoleFull)
 for   PR 2858 at commit 
[`9866fbf`](https://github.com/apache/spark/commit/9866fbfacf90f319dc1e318077f7d433e1bcb222).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix broken links in README.md

2014-10-20 Thread ryan-williams

GitHub user ryan-williams opened a pull request:

https://github.com/apache/spark/pull/2859

fix broken links in README.md

seems like `building-spark.html` was renamed to `building-with-maven.html`?

Is Maven the blessed build tool these days, or SBT? I couldn't find a 
building-with-sbt page so I went with the Maven one here.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ryan-williams/spark broken-links-readme

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2859.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2859


commit 154e096fa6b6663f40da20fefca6cf947a394a15
Author: Ryan Williams ryan.blake.willi...@gmail.com
Date:   2014-10-19T17:41:33Z

fix broken links in README.md

seems like building-spark.html was renamed to building-with-maven.html




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix broken links in README.md

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2859#issuecomment-59782880
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-20 Thread mdagost

Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59784360
  
@davies Your idea of adding something like `fromTupleRDD` to 
`PythonMLLibAPI` seems to be the way to go.  I'm just doing some cleanup and 
will push `userFeatures` and `productFeatures` in just a bit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread ryan-williams

Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59785816
  
@preaudc see last commit, I applied this change to the `case _` as well, 
per your suggestion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-59791098
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21918/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Initial time estimator with new column for rem...

2014-10-20 Thread devldevelopment

Github user devldevelopment commented on the pull request:

https://github.com/apache/spark/pull/2837#issuecomment-59791166
  
Ok thanks for the feedback guys, if this feature is no longer wanted or 
needed maybe be can close it (the JIRA 576)? Generally I'm getting to grips 
with scala and spark contribution so wanted a first easy task to implement. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-59791090
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21918/consoleFull)**
 for PR 2615 at commit 
[`812db5b`](https://github.com/apache/spark/commit/812db5bb3b70c2b20cd1ec1d05f376003e554b41)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2667#issuecomment-59793960
  
Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2667#issuecomment-59794008
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2667#issuecomment-59794882
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21920/consoleFull)
 for   PR 2667 at commit 
[`d64c120`](https://github.com/apache/spark/commit/d64c1201e439d2894a76196659c59b9abb03be5e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP]SPARK-3957: show broadcast variable resou...

2014-10-20 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r19096569
  
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -30,7 +30,8 @@ import org.apache.spark.util.ActorLogReceive
 private[spark] case class Heartbeat(
 executorId: String,
 taskMetrics: Array[(Long, TaskMetrics)], // taskId - TaskMetrics
-blockManagerId: BlockManagerId)
+blockManagerId: BlockManagerId,
+broadcastBlocks: Map[BlockId, Option[BlockStatus]])
--- End diff --

Would this send a BlockStatus for each broadcast variable on a heartbeat ? 
If we have hundreds or thousands of broadcast variables I wonder if the message 
size will become huge. Could we send deltas somehow ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix broken links in README.md

2014-10-20 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2859#issuecomment-59795665
  
No, the reverse actually. The site has not been rebuilt though to expose 
the new page. This has been asked several times so I hope the site can be 
refreshed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix broken links in README.md

2014-10-20 Thread ryan-williams

Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2859#issuecomment-59796389
  
I see. Do you want to:
* leave the broken links,
* add some basic building with sbt commands to the README, or
* point them at the building-with-maven page per the change here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2667#issuecomment-59796463
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21921/consoleFull)
 for   PR 2667 at commit 
[`be6645e`](https://github.com/apache/spark/commit/be6645eb4a6814f0a8d9983625444630e04e723e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2858#issuecomment-59797312
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21919/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4010][Web UI]Spark UI returns 500 in ya...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2858#issuecomment-59797300
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21919/consoleFull)
 for   PR 2858 at commit 
[`9866fbf`](https://github.com/apache/spark/commit/9866fbfacf90f319dc1e318077f7d433e1bcb222).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix broken links in README.md

2014-10-20 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2859#issuecomment-59797568
  
The issue is that the docs in `master` are all consistent, and think 
contain the correct current state of instructions. But on Github you see 
`master`'s `README.md` but the site is course built from branch 1.1. I think 
the intent is to move build instructions out of `README.md` more than the 
reverse since it duplicates the main doc page. It may be too inconvenient to 
back-port the doc changes to 1.1 and rebuild the site. Maybe an interim 
solution is to just have both links in `README.md`. Or slip in a redirect page 
from the new URL to old right now. Or hey it gets fixed in a month or two with 
1.2 anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: fix broken links in README.md

2014-10-20 Thread ryan-williams

Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2859#issuecomment-59799405
  
Interesting. What do you mean by both links? The current links to 
`building-spark.html`, as well as the `building-with-maven.html` links I've 
submitted here? The former currently 404, so keeping them in the README if we 
are going to the trouble of changing it doesn't make sense to me.

I see now that `README.md` is not up-to-date, but that was not at all 
apparent when I was getting set up with Spark over the weekend :-\ Seems like 
the README should be kept consistent with the source tree that it is committed 
with, and that can be decoupled from coarser per-release website refreshes. 
Could we add a couple commands explaining that `sbt` is blessed now, and 
showing how to use it? Otherwise maybe the README should just be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-20 Thread mccheah

Github user mccheah commented on the pull request:

https://github.com/apache/spark/pull/2828#issuecomment-59803881
  
@JoshRosen agreed with @ash211, this is really good.

Are there any actual comments on the PR, or can it be merged? =)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...

2014-10-20 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2844#issuecomment-59803868
  
LGTM now, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2824


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

2014-10-20 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2824#issuecomment-59805036
  
I've merged this into `master` and `branch-1.1`.

 Thanks a lot :).

Thank YOU (and @mridulm) for helping to diagnose this really subtle bug!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1658#issuecomment-59805377
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21922/consoleFull)
 for   PR 1658 at commit 
[`92bda0d`](https://github.com/apache/spark/commit/92bda0daf2fffeea0f1de9199fc71fe978a165c7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1658#issuecomment-59805569
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21922/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1658#issuecomment-59805565
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21922/consoleFull)
 for   PR 1658 at commit 
[`92bda0d`](https://github.com/apache/spark/commit/92bda0daf2fffeea0f1de9199fc71fe978a165c7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...

2014-10-20 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2860

[SPARK-3537][SQL] Refines in-memory columnar table statistics

This PR refines in-memory columnar table statistics:

1. adds 3 more statistics for in-memory table columns: `count`, `nullCount` 
and `sizeInBytes`, and filter pushdown support for `IS NULL` and `IS NOT NULL`.
1. caches and propagates statistics in `InMemoryRelation` once the 
underlying cached RDD is materialized.

   Statistics are collected to driver side with an accumulator.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark propagates-in-mem-stats

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2860


commit 7dc6a34166ad915e07438795ce6b6ea67b3fdee6
Author: Cheng Lian l...@databricks.com
Date:   2014-10-20T17:13:59Z

Adds more in-memory table statistics and propagates them properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...

2014-10-20 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2860#discussion_r19099520
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala ---
@@ -24,11 +24,13 @@ import 
org.apache.spark.sql.catalyst.expressions.{AttributeMap, Attribute, Attri
 import org.apache.spark.sql.catalyst.types._
 
 private[sql] class ColumnStatisticsSchema(a: Attribute) extends 
Serializable {
-  val upperBound = AttributeReference(a.name + .upperBound, a.dataType, 
nullable = false)()
-  val lowerBound = AttributeReference(a.name + .lowerBound, a.dataType, 
nullable = false)()
-  val nullCount =  AttributeReference(a.name + .nullCount, IntegerType, 
nullable = false)()
+  val upperBound = AttributeReference(a.name + .upperBound, a.dataType, 
nullable = true)()
+  val lowerBound = AttributeReference(a.name + .lowerBound, a.dataType, 
nullable = true)()
--- End diff --

Upper/lower bound can be null for types like string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...

2014-10-20 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2860#discussion_r19099771
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala ---
@@ -185,15 +196,16 @@ private[sql] class StringColumnStats extends 
ColumnStats {
 } else {
   nullCount += 1
 }
+count += 1
+sizeInBytes += STRING.actualSize(row, ordinal)
--- End diff --

This can potentially slow down caching process of string columns, because 
the `.getBytes(utf-8)` call within `actualSize` traverses the whole string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3537][SQL] Refines in-memory columnar t...

2014-10-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2860#issuecomment-59806948
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21923/consoleFull)
 for   PR 2860 at commit 
[`7dc6a34`](https://github.com/apache/spark/commit/7dc6a34166ad915e07438795ce6b6ea67b3fdee6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Initial commit to provide pluggable strategy t...

2014-10-20 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2849#issuecomment-59807087
  
Hey @olegz is there an associated JIRA for this? If so could you include it 
in the title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread ryan-williams

Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59807140
  
Jenkins, test this please. Does that work if I am not an admin?

@pwendell agreed, the logic is a little tricky but I couldn't find a 
simpler way to express it; in the meantime, I factored it out since it was 
repeated in two `case`s


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3537][SPARK-3914][SQL] Refines in-memor...

2014-10-20 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2860#discussion_r1919
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -76,4 +76,24 @@ class PlannerSuite extends FunSuite {
 
 setConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD, origThreshold.toString)
   }
+
+  test(InMemoryRelation statistics propagation) {
--- End diff --

Test case for SPARK-3914.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59807368
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3561] Initial commit to provide pluggab...

2014-10-20 Thread olegz

Github user olegz commented on the pull request:

https://github.com/apache/spark/pull/2849#issuecomment-59807570
  
@andrewor14 done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 383 matches

Mail list logo