[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20466
  
**[Test build #86927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86927/testReport)**
 for PR 20466 at commit 
[`6e55d10`](https://github.com/apache/spark/commit/6e55d1000c62a86c14ad993d3699b0ed99f53cbb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
I've not worked in the logging stuff yet, feel free to take it, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86923/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20461
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20469
  
**[Test build #4087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4087/testReport)**
 for PR 20469 at commit 
[`15d67ee`](https://github.com/apache/spark/commit/15d67eee9baa87a8fa08a265549000386fd476a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20466
  
**[Test build #86927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86927/testReport)**
 for PR 20466 at commit 
[`6e55d10`](https://github.com/apache/spark/commit/6e55d1000c62a86c14ad993d3699b0ed99f53cbb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165379876
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +599,15 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, ex, 
"prediction") {
+case exRows: Seq[Row] =>
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86918/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86921 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86921/testReport)**
 for PR 20455 at commit 
[`5246fcc`](https://github.com/apache/spark/commit/5246fcc5bb5936d64991fe7eb6acdd4cbdc25e05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20461
  
**[Test build #86917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86917/testReport)**
 for PR 20461 at commit 
[`fed6dc2`](https://github.com/apache/spark/commit/fed6dc25c6293cad08e6759bc0a1cf414b91dfd0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20472
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20471
  
Thanks! merging to master/2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20475#discussion_r165369607
  
--- Diff: python/pyspark/ml/image.py ---
@@ -75,6 +76,23 @@ def ocvTypes(self):
 self._ocvTypes = 
dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
 return self._ocvTypes
 
+@property
+def columnSchema(self):
+"""
+Returns the schema for the image column.
+
+:return: a :class:`StructType` for image column,
+``struct``.
+
+.. versionadded:: 2.3.0
--- End diff --

I am fine with 2.4.0. Let me know.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #86931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86931/testReport)**
 for PR 17886 at commit 
[`dfb1ee5`](https://github.com/apache/spark/commit/dfb1ee5fbf7469895f5f91fe9f9d63dc202ca1b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method...

2018-02-01 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/20475

[SPARK-23256][ML][PYTHON] Add columnSchema method to PySpark image reader

## What changes were proposed in this pull request?

This PR proposes to add `columnSchema` in Python side too.

```python
>>> from pyspark.ml.image import ImageSchema
>>> ImageSchema.columnSchema.simpleString()

'struct'
```

## How was this patch tested?

Manually tested and unittest was added in `python/pyspark/ml/tests.py`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-23256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20475


commit e180ade8a86c51f7eac1fd63e0febc09f9889f7d
Author: hyukjinkwon 
Date:   2018-02-01T14:14:40Z

Add columnSchema method to PySpark image reader




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20475
  
@MrBago, @BryanCutler, @imatiach-msft, and @MLnick, could you take a look 
please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20476
  
**[Test build #86933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86933/testReport)**
 for PR 20476 at commit 
[`353dd6b`](https://github.com/apache/spark/commit/353dd6bc60ce7123c392d7b51a496d45b1d7ab5c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20476
  
cc @gatorsmile @rdblue most of the changes are tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20476#discussion_r165375489
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -81,35 +81,34 @@ object PushDownOperatorsToDataSource extends 
Rule[LogicalPlan] with PredicateHel
 
 // TODO: add more push down rules.
 
-// TODO: nested fields pruning
-def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: 
Seq[Attribute]): Unit = {
-  plan match {
-case Project(projectList, child) =>
-  val required = 
projectList.filter(requiredByParent.contains).flatMap(_.references)
-  pushDownRequiredColumns(child, required)
-
-case Filter(condition, child) =>
-  val required = requiredByParent ++ condition.references
-  pushDownRequiredColumns(child, required)
-
-case DataSourceV2Relation(fullOutput, reader) => reader match {
-  case r: SupportsPushDownRequiredColumns =>
-// Match original case of attributes.
-val attrMap = AttributeMap(fullOutput.zip(fullOutput))
-val requiredColumns = requiredByParent.map(attrMap)
-r.pruneColumns(requiredColumns.toStructType)
-  case _ =>
-}
+pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
+// After column pruning, we may have redundant PROJECT nodes in the 
query plan, remove them.
+RemoveRedundantProject(filterPushed)
+  }
+
+  // TODO: nested fields pruning
+  private def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: 
AttributeSet): Unit = {
+plan match {
+  case Project(projectList, child) =>
+val required = projectList.flatMap(_.references)
+pushDownRequiredColumns(child, AttributeSet(required))
+
+  case Filter(condition, child) =>
+val required = requiredByParent ++ condition.references
+pushDownRequiredColumns(child, required)
 
-// TODO: there may be more operators can be used to calculate 
required columns, we can add
-// more and more in the future.
-case _ => plan.children.foreach(child => 
pushDownRequiredColumns(child, child.output))
+  case relation: DataSourceV2Relation => relation.reader match {
+case reader: SupportsPushDownRequiredColumns =>
+  val requiredColumns = 
relation.output.filter(requiredByParent.contains)
--- End diff --

a cleaner way to retain the original case of attributes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #86931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86931/testReport)**
 for PR 17886 at commit 
[`dfb1ee5`](https://github.com/apache/spark/commit/dfb1ee5fbf7469895f5f91fe9f9d63dc202ca1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86931/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86935/testReport)**
 for PR 20455 at commit 
[`6d5f7ec`](https://github.com/apache/spark/commit/6d5f7ec3e6f25e683628370350cfb865aac29d65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
> My proposal is, pandas and pyarrow should be a hard requirement for our 
jenkins, to make sure the features are well tested.

If this is a goal, I think another simple way is just to use an env set in 
Jenkins and throw an exception if both PyArrow or Pandas are not installed in 
the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
@cloud-fan, will try it. Thank you sincerely.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86925/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20468


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20473
  
**[Test build #86930 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86930/testReport)**
 for PR 20473 at commit 
[`e7d752f`](https://github.com/apache/spark/commit/e7d752f22286e97f784f69744cf2d3aefbb6d28d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20446#discussion_r165362148
  
--- Diff: docs/ml-statistics.md ---
@@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python 
docs](api/python/index.html#pyspark.ml.stat
 {% include_example python/ml/chi_square_test_example.py %}
 
 
+
+
+## Summarizer
+
+We provide vector column summary statistics for `Dataframe` through 
`Summarizer`.
+Available metrics contain the column-wise max, min, mean, variance, and 
number of nonzeros, as well as the total count.
+
+
+
+[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$)
--- End diff --

Perhaps "The following example demonstrates using `Summarizer`(...) to 
compute the mean and variance for the input dataframe, with and without a 
weight column"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20446#discussion_r165362440
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java 
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import org.apache.spark.sql.*;
+
+// $example on$
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.spark.ml.linalg.Vector;
+import org.apache.spark.ml.linalg.Vectors;
+import org.apache.spark.ml.linalg.VectorUDT;
+import org.apache.spark.ml.stat.Summarizer;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+// $example off$
+
+public class JavaSummarizerExample {
+  public static void main(String[] args) {
+SparkSession spark = SparkSession
+  .builder()
+  .appName("JavaSummarizerExample")
+  .getOrCreate();
+
+// $example on$
+List data = Arrays.asList(
+  RowFactory.create(Vectors.dense(2.0, 3.0, 5.0), 1.0),
+  RowFactory.create(Vectors.dense(4.0, 6.0, 7.0), 2.0)
+);
+
+StructType schema = new StructType(new StructField[]{
+  new StructField("features", new VectorUDT(), false, 
Metadata.empty()),
+  new StructField("weight", DataTypes.DoubleType, false, 
Metadata.empty())
+});
+
+Dataset df = spark.createDataFrame(data, schema);
+
+Row result1 = df.select(Summarizer.metrics("mean", "variance")
+.summary(new Column("features"), new Column("weight")))
+.first().getStruct(0);
+System.out.println("with weight: mean = " + 
result1.getAs(0).toString() +
+  ", variance = " + result1.getAs(1).toString());
+
+Row result2 = df.select(
+  Summarizer.mean(new Column("features")),
+  Summarizer.variance(new Column("features"))
+).first();
+System.out.println("without weight: mean = " + 
result2.getAs(0).toString() +
--- End diff --

Why not just `df.select(...).show()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20446#discussion_r165360692
  
--- Diff: docs/ml-statistics.md ---
@@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python 
docs](api/python/index.html#pyspark.ml.stat
 {% include_example python/ml/chi_square_test_example.py %}
 
 
+
+
+## Summarizer
+
+We provide vector column summary statistics for `Dataframe` through 
`Summarizer`.
+Available metrics contain the column-wise max, min, mean, variance, and 
number of nonzeros, as well as the total count.
--- End diff --

Perhaps "contain" -> "are" or "include"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20446#discussion_r165362568
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.stat.Summarizer
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+object SummarizerExample {
+  def main(args: Array[String]): Unit = {
+val spark = SparkSession
+  .builder
+  .appName("SummarizerExample")
+  .getOrCreate()
+
+import spark.implicits._
+import Summarizer._
+
+// $example on$
+val data = Seq(
+  (Vectors.dense(2.0, 3.0, 5.0), 1.0),
+  (Vectors.dense(4.0, 6.0, 7.0), 2.0)
+)
+
+val df = data.toDF("features", "weight")
+
+val Tuple1((meanVal, varianceVal)) = df.select(metrics("mean", 
"variance")
+  .summary($"features", $"weight"))
+  .as[Tuple1[(Vector, Vector)]].first()
+
+println(s"with weight: mean = ${meanVal}, variance = ${varianceVal}")
+
+val (meanVal2, varianceVal2) = df.select(mean($"features"), 
variance($"features"))
+  .as[(Vector, Vector)].first()
+
+println(s"without weight: mean = ${meanVal2}, sum = ${varianceVal2}")
--- End diff --

Same applies here, why not just `df.select(...).show()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20446#discussion_r165362364
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java 
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import org.apache.spark.sql.*;
+
+// $example on$
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.spark.ml.linalg.Vector;
+import org.apache.spark.ml.linalg.Vectors;
+import org.apache.spark.ml.linalg.VectorUDT;
+import org.apache.spark.ml.stat.Summarizer;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+// $example off$
+
+public class JavaSummarizerExample {
+  public static void main(String[] args) {
+SparkSession spark = SparkSession
+  .builder()
+  .appName("JavaSummarizerExample")
+  .getOrCreate();
+
+// $example on$
+List data = Arrays.asList(
+  RowFactory.create(Vectors.dense(2.0, 3.0, 5.0), 1.0),
+  RowFactory.create(Vectors.dense(4.0, 6.0, 7.0), 2.0)
+);
+
+StructType schema = new StructType(new StructField[]{
+  new StructField("features", new VectorUDT(), false, 
Metadata.empty()),
+  new StructField("weight", DataTypes.DoubleType, false, 
Metadata.empty())
+});
+
+Dataset df = spark.createDataFrame(data, schema);
+
+Row result1 = df.select(Summarizer.metrics("mean", "variance")
+.summary(new Column("features"), new Column("weight")))
+.first().getStruct(0);
+System.out.println("with weight: mean = " + 
result1.getAs(0).toString() +
--- End diff --

Why not just `df.select(...).show()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20446#discussion_r165362533
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.stat.Summarizer
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+object SummarizerExample {
+  def main(args: Array[String]): Unit = {
+val spark = SparkSession
+  .builder
+  .appName("SummarizerExample")
+  .getOrCreate()
+
+import spark.implicits._
+import Summarizer._
+
+// $example on$
+val data = Seq(
+  (Vectors.dense(2.0, 3.0, 5.0), 1.0),
+  (Vectors.dense(4.0, 6.0, 7.0), 2.0)
+)
+
+val df = data.toDF("features", "weight")
+
+val Tuple1((meanVal, varianceVal)) = df.select(metrics("mean", 
"variance")
+  .summary($"features", $"weight"))
+  .as[Tuple1[(Vector, Vector)]].first()
+
+println(s"with weight: mean = ${meanVal}, variance = ${varianceVal}")
--- End diff --

Same applies here, why not just `df.select(...).show()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20474
  
cc @squito @ajbozarth 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20466
  
**[Test build #86934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86934/testReport)**
 for PR 20466 at commit 
[`05253f0`](https://github.com/apache/spark/commit/05253f0f341d1444ad378bff286f15685953bdd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86920/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20469
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20455: [SPARK-23284][SQL] Document the behavior of sever...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20455#discussion_r165324640
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1269,38 @@ class ColumnarBatchSuite extends SparkFunSuite {
 batch.close()
 allocator.close()
   }
+
+  testVector("getDecimal should return null for null slot", 4, 
DecimalType.IntDecimal) {
--- End diff --

shall we make it a normal test case for decimal type? we can follow the 
other tests, e.g. create a decimal array, and check the value of column vector 
at the same index.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20471
  
**[Test build #86922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86922/testReport)**
 for PR 20471 at commit 
[`af757ef`](https://github.com/apache/spark/commit/af757ef04626df632b47b39c49ec91bdec177051).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86923/testReport)**
 for PR 20455 at commit 
[`35548e6`](https://github.com/apache/spark/commit/35548e6d30211cf155a366da2ad736d1281367bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86926/testReport)**
 for PR 20455 at commit 
[`923d0fe`](https://github.com/apache/spark/commit/923d0fe042befe722905791fd8dfcb42003f5e15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20473#discussion_r165350209
  
--- Diff: python/run-tests.py ---
@@ -151,6 +151,38 @@ def parse_opts():
 return opts
 
 
+def _check_dependencies(python_exec, modules_to_test):
+if "COVERAGE_PROCESS_START" in os.environ:
+# Make sure if coverage is installed.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import coverage"],
+stderr=open(os.devnull, 'w'))
+except:
+print_red("Coverage is not installed in Python executable '%s' 
"
+  "but 'COVERAGE_PROCESS_START' environment variable 
is set, "
+  "exiting." % python_exec)
+sys.exit(-1)
+
+if pyspark_sql in modules_to_test:
+# If we should test 'pyspark-sql', it checks if PyArrow and Pandas 
are installed and
+# explicitly prints out. See SPARK-23300.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import pyarrow"],
+stderr=open(os.devnull, 'w'))
--- End diff --

Otherwise, it prints out the exception too, for example:

```
Will test the following Python modules: ['pyspark-sql']
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named foo
PyArrow is not installed in Python executable 'python2.7', skipping related 
tests in 'pyspark-sql'.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20473
  
Current Jenkins output was:

```

Running PySpark tests

Running PySpark tests. Output is in 
/home/jenkins/workspace/SparkPullRequestBuilder/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 
'python3.4', 'pypy']
Will test the following Python modules: ['pyspark-core', 'pyspark-sql', 
'pyspark-streaming', 'pyspark-mllib', 'pyspark-ml']
PyArrow is not installed in Python executable 'python2.7', skipping related 
tests in 'pyspark-sql'.
PyArrow is not installed in Python executable 'pypy', skipping related 
tests in 'pyspark-sql'.
Pandas is not installed in Python executable 'pypy', skipping related tests 
in 'pyspark-sql'.
Starting test(pypy): pyspark.sql.tests
Starting test(python2.7): pyspark.mllib.tests
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20473
  
**[Test build #86930 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86930/testReport)**
 for PR 20473 at commit 
[`e7d752f`](https://github.com/apache/spark/commit/e7d752f22286e97f784f69744cf2d3aefbb6d28d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86930/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20475
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20476#discussion_r165375177
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -81,35 +81,34 @@ object PushDownOperatorsToDataSource extends 
Rule[LogicalPlan] with PredicateHel
 
 // TODO: add more push down rules.
 
-// TODO: nested fields pruning
-def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: 
Seq[Attribute]): Unit = {
-  plan match {
-case Project(projectList, child) =>
-  val required = 
projectList.filter(requiredByParent.contains).flatMap(_.references)
--- End diff --

This line is wrong and I fixed to 
https://github.com/apache/spark/pull/20476/files#diff-b7f3810e65a2bb1585de9609ea491469R93


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165382316
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -628,18 +635,24 @@ class ALSSuite
 }
 withClue("transform should fail when ids exceed integer range. ") {
   val model = als.fit(df)
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_big").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_small").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_big").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_small").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
+  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+assert(intercept[SparkException] {
+  model.transform(dataFrame).first
+}.getMessage.contains(msg))
+assert(intercept[StreamingQueryException] {
+  testTransformer[A](dataFrame, model, "prediction") {
+case _ =>
--- End diff --

Partial function removed. This code part expects `StreamingQueryException` 
which is quite close to this area. Not sure whether a comment would make it 
better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165382596
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -653,6 +666,7 @@ class ALSSuite
   test("ALS cold start user/item prediction strategy") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/476/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20466
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20472
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20454
  
adding a default method to a java interface is binary compatible, I'm 
merging this to master only, to follow @rxin 's suggestion, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20468
  
Thanks! merging to master/2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/477/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

2018-02-01 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/20430
  
Can we specialize this CTAS case? For data changing commands like INSERT, I 
think we should remove the stats if auto update is disabled, because the 
previous stats are inaccurate after the insertion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20473
  
**[Test build #86929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86929/testReport)**
 for PR 20473 at commit 
[`0261045`](https://github.com/apache/spark/commit/026104543bc2a9ea39e710f1df52e0c6ba15faab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...

2018-02-01 Thread attilapiros
GitHub user attilapiros opened a pull request:

https://github.com/apache/spark/pull/20474

[SPARK-23235][Core] Add executor Threaddump to api

## What changes were proposed in this pull request?

Extending api with the executor thread dump data.

For this new REST URL is introduced:
- GET 
http://localhost:4040/api/v1/applications/{applicationId}/executors/{executorId}/threads


Example response:

``` javascript
[ {
  "threadId" : 52,
  "threadName" : "context-cleaner-periodic-gc",
  "threadState" : "TIMED_WAITING",
  "stackTrace" : "sun.misc.Unsafe.park(Native 
Method)\njava.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)\njava.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)\njava.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)\njava.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)\njava.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)",
  "blockedByLock" : 
"Lock(java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1385411893})",
  "holdingLocks" : [ ]
}, {
  "threadId" : 48,
  "threadName" : "dag-scheduler-event-loop",
  "threadState" : "WAITING",
  "stackTrace" : "sun.misc.Unsafe.park(Native 
Method)\njava.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\njava.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)\njava.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)\njava.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)\norg.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:46)",
  "blockedByLock" : 
"Lock(java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1138053349})",
  "holdingLocks" : [ ]
}, {
  "threadId" : 17,
  "threadName" : "dispatcher-event-loop-0",
  "threadState" : "WAITING",
  "stackTrace" : "sun.misc.Unsafe.park(Native 
Method)\njava.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\njava.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)\njava.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)\norg.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)",
  "blockedByLock" : 
"Lock(java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1764626380})",
  "holdingLocks" : [ 
"Lock(java.util.concurrent.ThreadPoolExecutor$Worker@832743930})" ]
}, {
  "threadId" : 18,
  "threadName" : "dispatcher-event-loop-1",
  "threadState" : "WAITING",
  "stackTrace" : "sun.misc.Unsafe.park(Native 
Method)\njava.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\njava.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)\njava.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)\norg.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)",
  "blockedByLock" : 
"Lock(java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1764626380})",
  "holdingLocks" : [ 
"Lock(java.util.concurrent.ThreadPoolExecutor$Worker@834153999})" ]
}, {
  "threadId" : 19,
  "threadName" : "dispatcher-event-loop-2",
  "threadState" : "WAITING",
  "stackTrace" : "sun.misc.Unsafe.park(Native 
Method)\njava.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\njava.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)\njava.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)\norg.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)",
  "blockedByLock" : 
"Lock(java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1764626380})",
  "holdingLocks" : [ 
"Lock(java.util.concurrent.ThreadPoolExecutor$Worker@664836465})" ]
}, {
  

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86927/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20422: [SPARK-23253][Core][Shuffle]Only write shuffle te...

2018-02-01 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20422#discussion_r165256701
  
--- Diff: 
core/src/test/scala/org/apache/spark/shuffle/sort/IndexShuffleBlockResolverSuite.scala
 ---
@@ -89,26 +96,39 @@ class IndexShuffleBlockResolverSuite extends 
SparkFunSuite with BeforeAndAfterEa
 } {
   out2.close()
 }
-resolver.writeIndexFileAndCommit(1, 2, lengths2, dataTmp2)
+resolver.writeIndexFileAndCommit(shuffleId, mapId, lengths2, dataTmp2)
+
+assert(indexFile.length() === (lengths.length + 1) * 8)
 assert(lengths2.toSeq === lengths.toSeq)
 assert(dataFile.exists())
 assert(dataFile.length() === 30)
 assert(!dataTmp2.exists())
 
 // The dataFile should be the previous one
 val firstByte = new Array[Byte](1)
-val in = new FileInputStream(dataFile)
+val dataIn = new FileInputStream(dataFile)
 Utils.tryWithSafeFinally {
-  in.read(firstByte)
+  dataIn.read(firstByte)
 } {
-  in.close()
+  dataIn.close()
 }
 assert(firstByte(0) === 0)
 
+// The index file should not change
+val secondValueOffset = new Array[Byte](8)
+val indexIn = new FileInputStream(indexFile)
+Utils.tryWithSafeFinally {
+  indexIn.read(secondValueOffset)
+  indexIn.read(secondValueOffset)
+} {
+  indexIn.close()
+}
+assert(secondValueOffset(7) === 10, "The index file should not change")
--- End diff --

minor: here and below, would be more clear if you use 
`DataInputStream.readLong()` (no magic 7 offset, and you check the rest of the 
bytes):
```scala
val indexIn = new DataInputStream( newFileInputStream(indexFile))
Utils.tryWithSafeFinally {
  indexIn.readLong()  // first offset is always 0
  assert(10 === indexIn.readLong(),"The index file should not change")
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

2018-02-01 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20476

[SPARK-23301][SQL] data source column pruning should work for arbitrary 
expressions

## What changes were proposed in this pull request?

This PR fixes a mistake in the `PushDownOperatorsToDataSource` rule, the 
column pruning logic is incorrect about `Project`.

## How was this patch tested?

a new test case for column pruning with arbitrary expressions, and improve 
the existing tests to make sure the `PushDownOperatorsToDataSource` really 
works.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark push-down

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20476


commit 353dd6bc60ce7123c392d7b51a496d45b1d7ab5c
Author: Wenchen Fan 
Date:   2018-02-01T12:02:23Z

data source column pruning should work for arbitrary expressions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165378573
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -566,6 +565,7 @@ class ALSSuite
   test("read/write") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20468
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Thank you for bearing with me @cloud-fan. I agree with it.

BTW, are you working on the logging thing BTW? I was thinking the simplest 
way to check is just print out once if PyArrow / Pandas are installed or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86921/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20471
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread lucio-yz
GitHub user lucio-yz opened a pull request:

https://github.com/apache/spark/pull/20472

[SPARK-22751][ML]Improve ML RandomForest shuffle performance

## What changes were proposed in this pull request?

As I mentioned in 
[SPARK-22751](https://issues.apache.org/jira/browse/SPARK-22751?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20ML%20AND%20text%20~%20randomforest),
 there is a shuffle performance problem in ML Randomforest when train a RF in 
high dimensional data. 

The reason is that, in org.apache.spark.tree.impl.RandomForest, the 
function findSplitsBySorting will actually flatmap a sparse vector into a dense 
vector, then in groupByKey there will be a huge shuffle write size.

To avoid this, we can add a filter after flatmap, to filter out zero value. 
And in function findSplitsForContinuousFeature, we can infer the number of zero 
value by pass a parameter numInput to function findSplitsForContinuousFeature. 
numInput is the number of samples.

In addition, if a feature only contains zero value, continuousSplits will 
not has the key of feature id. So I add a check when using continuousSplits.

## How was this patch tested?
Ran model locally using spark-submit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lucio-yz/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20472


commit 50cb173dd34dc353c243b97f2686a8c545a03909
Author: lucio <576632108@...>
Date:   2018-02-01T09:47:52Z

fix mllib randomforest shuffle issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86917/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/20461
  
@cloud-fan thanks a lot for ping. LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/478/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20473
  
**[Test build #86929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86929/testReport)**
 for PR 20473 at commit 
[`0261045`](https://github.com/apache/spark/commit/026104543bc2a9ea39e710f1df52e0c6ba15faab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20473
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86929/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20474
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20474
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20475
  
**[Test build #86932 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86932/testReport)**
 for PR 20475 at commit 
[`e180ade`](https://github.com/apache/spark/commit/e180ade8a86c51f7eac1fd63e0febc09f9889f7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20475
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20475
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86932/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...

2018-02-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20474#discussion_r165377643
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2168,7 +2168,17 @@ private[spark] object Utils extends Logging {
 // We need to filter out null values here because dumpAllThreads() may 
return null array
 // elements for threads that are dead / don't exist.
 val threadInfos = 
ManagementFactory.getThreadMXBean.dumpAllThreads(true, true).filter(_ != null)
-threadInfos.sortBy(_.getThreadId).map(threadInfoToThreadStackTrace)
--- End diff --

`sortBy(_.getThreadId)` just disappeared. Is it intentional?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yup, explicitly logging sounds fine for now so that we can easily check.

>  I do prefer to have these conditional skips removed because sometimes it 
is hard to tell if everything passed or was just skipped

To be clear, I think it's more because our own testing script doesn't show 
the skipped tests output from unittests in the console.

Also, I think it's more because we couldn't make sure Pandas and Arrow were 
installed properly in testing env, Jenkins but not because we skip tests 
related with extra dependencies when they are not installed. Making them as 
required dependencies is a big deal IMHO.

FYI, I tried to install PyArrow with PyPy last time and I failed. I wonder 
if we can easily install it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20471
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86922/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColumnarRow....

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20471
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20471: [SPARK-23280][SQL][FOLLOWUP] Enable `MutableColum...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20471


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.Download...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20461


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20470: [SPARK-23296][YARN] Include stacktrace in YARN-app diagn...

2018-02-01 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20470
  
That seems reasonably harmless; CC @vanzin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20475
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/479/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20475
  
**[Test build #86932 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86932/testReport)**
 for PR 20475 at commit 
[`e180ade`](https://github.com/apache/spark/commit/e180ade8a86c51f7eac1fd63e0febc09f9889f7d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/480/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >