[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21513#discussion_r194214431
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, 
JavaMLReadable,
+   JavaMLWritable):
+"""
+.. note:: Experimental
+
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+This class is not yet an Estimator/Transformer, use `assignClusters` 
method to run the
+PowerIterationClustering algorithm.
+
+.. seealso:: `Wikipedia on Spectral clustering \
+`_
+
+>>> from pyspark.sql.types import DoubleType, LongType, StructField, 
StructType
+>>> import math
+>>> def genCircle(r, n):
+... points = []
+... for i in range(0, n):
+... theta = 2.0 * math.pi * i / n
+... points.append((r * math.cos(theta), r * math.sin(theta)))
+... return points
+>>> def sim(x, y):
+... dist = (x[0] - y[0]) * (x[0] - y[0]) + (x[1] - y[1]) * (x[1] - 
y[1])
+... return math.exp(-dist / 2.0)
+>>> r1 = 1.0
+>>> n1 = 10
+>>> r2 = 4.0
+>>> n2 = 40
+>>> n = n1 + n2
+>>> points = genCircle(r1, n1) + genCircle(r2, n2)
+>>> data = [(i, j, sim(points[i], points[j])) for i in range(1, n) for 
j in range(0, i)]
+>>> rdd = sc.parallelize(data, 2)
+>>> schema = StructType([StructField("src", LongType(), False), \
+ StructField("dst", LongType(),  True), \
+ StructField("weight", DoubleType(), True)])
+>>> df = spark.createDataFrame(rdd, schema)
+>>> pic = PowerIterationClustering()
+>>> assignments = 
pic.setK(2).setMaxIter(40).setWeightCol("weight").assignClusters(df)
+>>> result = sorted(assignments.collect(), key=lambda x: x.id)
+>>> result[0].cluster == result[1].cluster == result[2].cluster == 
result[3].cluster
+True
+>>> result[4].cluster == result[5].cluster == result[6].cluster == 
result[7].cluster
+True
+>>> pic_path = temp_path + "/pic"
+>>> pic.save(pic_path)
+>>> pic2 = PowerIterationClustering.load(pic_path)
+>>> pic2.getK()
+2
+>>> pic2.getMaxIter()
+40
+>>> assignments2 = pic2.assignClusters(df)
+>>> result2 = sorted(assignments2.collect(), key=lambda x: x.id)
+>>> result2[0].cluster == result2[1].cluster == result2[2].cluster == 
result2[3].cluster
+True
+>>> result2[4].cluster == result2[5].cluster == result2[6].cluster == 
result2[7].cluster
+True
+>>> pic3 = PowerIterationClustering(k=4, initMode="degree", 
srcCol="source", dstCol="dest")
+>>> pic3.getSrcCol()
+'source'
+>>> pic3.getDstCol()
+'dest'
+>>> pic3.getK()
+4
+>>> pic3.getMaxIter()
+20
+>>> pic3.getInitMode()
+'degree'
+
+.. versionadded:: 2.4.0
+"""
+
+k = Param(Params._dummy(), "k",
+  "The number of clusters to create. Must be > 1.",
+  typeConverter=TypeConverters.toInt)
+initMode = Param(Params._dummy(), "initMode",
+ "The initialization algorithm. This can be either " +
+ "'random' to use a random vector as vertex 
properties, or 'degree' to use " +
+ "a normalized sum of similarities with other 
vertices.  Supported options: " +
+ "'random' and 'degree'.",
+ typeConverter=TypeConverters.toString)
+srcCol = Param(Params._dummy(), "srcCol",
+   "Name of the input column for source vertex IDs.",
+   typeConverter=TypeConverters.toString)
+dstCol = Param(Params._dummy(), "dstCol",
+   "Name of the input column for destination vertex IDs.",
+   typeConverter=TypeConverters.toString)
+
+@keyword_only
+def __init__(self, k=2, maxIter=20, initMode="random", srcCol="src", 
dstCol="dst",
+ weightCol=None):
+"""
+__init__(self, k=2, maxIter=20, initMode="random", srcCol="src", 
dstCol="dst",\
+ weightCol=None)
+"""
+ 

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21513#discussion_r194214516
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, 
JavaMLReadable,
+   JavaMLWritable):
+"""
+.. note:: Experimental
+
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+This class is not yet an Estimator/Transformer, use `assignClusters` 
method to run the
+PowerIterationClustering algorithm.
+
+.. seealso:: `Wikipedia on Spectral clustering \
+`_
+
+>>> from pyspark.sql.types import DoubleType, LongType, StructField, 
StructType
+>>> import math
+>>> def genCircle(r, n):
+... points = []
+... for i in range(0, n):
+... theta = 2.0 * math.pi * i / n
+... points.append((r * math.cos(theta), r * math.sin(theta)))
+... return points
+>>> def sim(x, y):
+... dist = (x[0] - y[0]) * (x[0] - y[0]) + (x[1] - y[1]) * (x[1] - 
y[1])
+... return math.exp(-dist / 2.0)
+>>> r1 = 1.0
+>>> n1 = 10
+>>> r2 = 4.0
+>>> n2 = 40
+>>> n = n1 + n2
+>>> points = genCircle(r1, n1) + genCircle(r2, n2)
+>>> data = [(i, j, sim(points[i], points[j])) for i in range(1, n) for 
j in range(0, i)]
+>>> rdd = sc.parallelize(data, 2)
+>>> schema = StructType([StructField("src", LongType(), False), \
+ StructField("dst", LongType(),  True), \
+ StructField("weight", DoubleType(), True)])
+>>> df = spark.createDataFrame(rdd, schema)
+>>> pic = PowerIterationClustering()
+>>> assignments = 
pic.setK(2).setMaxIter(40).setWeightCol("weight").assignClusters(df)
+>>> result = sorted(assignments.collect(), key=lambda x: x.id)
+>>> result[0].cluster == result[1].cluster == result[2].cluster == 
result[3].cluster
+True
+>>> result[4].cluster == result[5].cluster == result[6].cluster == 
result[7].cluster
+True
+>>> pic_path = temp_path + "/pic"
+>>> pic.save(pic_path)
+>>> pic2 = PowerIterationClustering.load(pic_path)
+>>> pic2.getK()
+2
+>>> pic2.getMaxIter()
+40
+>>> assignments2 = pic2.assignClusters(df)
+>>> result2 = sorted(assignments2.collect(), key=lambda x: x.id)
+>>> result2[0].cluster == result2[1].cluster == result2[2].cluster == 
result2[3].cluster
+True
+>>> result2[4].cluster == result2[5].cluster == result2[6].cluster == 
result2[7].cluster
+True
+>>> pic3 = PowerIterationClustering(k=4, initMode="degree", 
srcCol="source", dstCol="dest")
+>>> pic3.getSrcCol()
+'source'
+>>> pic3.getDstCol()
+'dest'
+>>> pic3.getK()
+4
+>>> pic3.getMaxIter()
+20
+>>> pic3.getInitMode()
+'degree'
+
+.. versionadded:: 2.4.0
+"""
+
+k = Param(Params._dummy(), "k",
+  "The number of clusters to create. Must be > 1.",
+  typeConverter=TypeConverters.toInt)
+initMode = Param(Params._dummy(), "initMode",
+ "The initialization algorithm. This can be either " +
+ "'random' to use a random vector as vertex 
properties, or 'degree' to use " +
+ "a normalized sum of similarities with other 
vertices.  Supported options: " +
+ "'random' and 'degree'.",
+ typeConverter=TypeConverters.toString)
+srcCol = Param(Params._dummy(), "srcCol",
+   "Name of the input column for source vertex IDs.",
+   typeConverter=TypeConverters.toString)
+dstCol = Param(Params._dummy(), "dstCol",
+   "Name of the input column for destination vertex IDs.",
+   typeConverter=TypeConverters.toString)
+
+@keyword_only
+def __init__(self, k=2, maxIter=20, initMode="random", srcCol="src", 
dstCol="dst",
+ weightCol=None):
+"""
+__init__(self, k=2, maxIter=20, initMode="random", srcCol="src", 
dstCol="dst",\
+ weightCol=None)
+"""
+ 

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21513#discussion_r194214535
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, 
JavaMLReadable,
+   JavaMLWritable):
+"""
+.. note:: Experimental
+
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+This class is not yet an Estimator/Transformer, use `assignClusters` 
method to run the
--- End diff --

```
... use :py:func:`assignClusters` method ...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21513#discussion_r194214831
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, 
JavaMLReadable,
+   JavaMLWritable):
+"""
+.. note:: Experimental
+
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+This class is not yet an Estimator/Transformer, use `assignClusters` 
method to run the
+PowerIterationClustering algorithm.
+
+.. seealso:: `Wikipedia on Spectral clustering \
+`_
+
+>>> from pyspark.sql.types import DoubleType, LongType, StructField, 
StructType
+>>> import math
+>>> def genCircle(r, n):
+... points = []
+... for i in range(0, n):
+... theta = 2.0 * math.pi * i / n
+... points.append((r * math.cos(theta), r * math.sin(theta)))
+... return points
+>>> def sim(x, y):
+... dist = (x[0] - y[0]) * (x[0] - y[0]) + (x[1] - y[1]) * (x[1] - 
y[1])
+... return math.exp(-dist / 2.0)
+>>> r1 = 1.0
+>>> n1 = 10
+>>> r2 = 4.0
+>>> n2 = 40
+>>> n = n1 + n2
+>>> points = genCircle(r1, n1) + genCircle(r2, n2)
+>>> data = [(i, j, sim(points[i], points[j])) for i in range(1, n) for 
j in range(0, i)]
+>>> rdd = sc.parallelize(data, 2)
+>>> schema = StructType([StructField("src", LongType(), False), \
+ StructField("dst", LongType(),  True), \
+ StructField("weight", DoubleType(), True)])
+>>> df = spark.createDataFrame(rdd, schema)
--- End diff --

The test code here is too complex to be in doctest. could you change it to 
code like:
``
df = sc.parallelize(...).toDF(...)
``
generate a small, hardcoded dataset.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21513#discussion_r194215008
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, 
JavaMLReadable,
+   JavaMLWritable):
+"""
+.. note:: Experimental
+
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+This class is not yet an Estimator/Transformer, use `assignClusters` 
method to run the
+PowerIterationClustering algorithm.
+
+.. seealso:: `Wikipedia on Spectral clustering \
+`_
+
+>>> from pyspark.sql.types import DoubleType, LongType, StructField, 
StructType
+>>> import math
+>>> def genCircle(r, n):
+... points = []
+... for i in range(0, n):
+... theta = 2.0 * math.pi * i / n
+... points.append((r * math.cos(theta), r * math.sin(theta)))
+... return points
+>>> def sim(x, y):
+... dist = (x[0] - y[0]) * (x[0] - y[0]) + (x[1] - y[1]) * (x[1] - 
y[1])
+... return math.exp(-dist / 2.0)
+>>> r1 = 1.0
+>>> n1 = 10
+>>> r2 = 4.0
+>>> n2 = 40
+>>> n = n1 + n2
+>>> points = genCircle(r1, n1) + genCircle(r2, n2)
+>>> data = [(i, j, sim(points[i], points[j])) for i in range(1, n) for 
j in range(0, i)]
+>>> rdd = sc.parallelize(data, 2)
+>>> schema = StructType([StructField("src", LongType(), False), \
+ StructField("dst", LongType(),  True), \
+ StructField("weight", DoubleType(), True)])
+>>> df = spark.createDataFrame(rdd, schema)
+>>> pic = PowerIterationClustering()
+>>> assignments = 
pic.setK(2).setMaxIter(40).setWeightCol("weight").assignClusters(df)
+>>> result = sorted(assignments.collect(), key=lambda x: x.id)
+>>> result[0].cluster == result[1].cluster == result[2].cluster == 
result[3].cluster
+True
+>>> result[4].cluster == result[5].cluster == result[6].cluster == 
result[7].cluster
+True
+>>> pic_path = temp_path + "/pic"
+>>> pic.save(pic_path)
+>>> pic2 = PowerIterationClustering.load(pic_path)
+>>> pic2.getK()
+2
+>>> pic2.getMaxIter()
+40
+>>> assignments2 = pic2.assignClusters(df)
+>>> result2 = sorted(assignments2.collect(), key=lambda x: x.id)
+>>> result2[0].cluster == result2[1].cluster == result2[2].cluster == 
result2[3].cluster
+True
+>>> result2[4].cluster == result2[5].cluster == result2[6].cluster == 
result2[7].cluster
+True
--- End diff --

Let's use a simpler way to check result, like:
```
>>> assignments.sort(assignments.id).show(truncate=False)
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19364
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19364
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91592/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19364
  
**[Test build #91592 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91592/testReport)**
 for PR 19364 at commit 
[`f0de287`](https://github.com/apache/spark/commit/f0de287a503ed89f27678a25001e7f9514ef3888).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17179: [SPARK-19067][SS] Processing-time-based timeout in MapGr...

2018-06-08 Thread huoyongpeng
Github user huoyongpeng commented on the issue:

https://github.com/apache/spark/pull/17179
  
This patch has not merged to the master yet, right? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20503
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91594/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20503
  
**[Test build #91594 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91594/testReport)**
 for PR 20503 at commit 
[`890aa65`](https://github.com/apache/spark/commit/890aa6514196b3c672c4581120506632dd49b4a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20503
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91591/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21518
  
**[Test build #91591 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91591/testReport)**
 for PR 21518 at commit 
[`3f59ca2`](https://github.com/apache/spark/commit/3f59ca2ebafae592be354fa75f28dfea63d2e568).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20503
  
**[Test build #91594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91594/testReport)**
 for PR 20503 at commit 
[`890aa65`](https://github.com/apache/spark/commit/890aa6514196b3c672c4581120506632dd49b4a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-06-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20503
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-06-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20503
  
we still need to fix this, right?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21402
  
**[Test build #91593 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91593/testReport)**
 for PR 21402 at commit 
[`d862a2d`](https://github.com/apache/spark/commit/d862a2dec30f1fefe03d26e0b8a07d10eac7dbf5).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class ChunkFetchRequestHandler extends 
SimpleChannelInboundHandler `
  * `public class TransportChannelHandler extends 
SimpleChannelInboundHandler `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21402
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91593/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21402
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21402
  
**[Test build #91593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91593/testReport)**
 for PR 21402 at commit 
[`d862a2d`](https://github.com/apache/spark/commit/d862a2dec30f1fefe03d26e0b8a07d10eac7dbf5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-06-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21402
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-06-08 Thread sujith71955
Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/20611
  
As @kevinyu98   mentioned below usecase where '?' is been used in the load 
command will fail as when we create a Path instance with uri , the chars 
followed by ? will be removed as part of resolving the uri. 
this we can address by directly creating Path instance and  call 
makeQualified() by passing defaultURI and workingDir. this approach will 
eliminate the code which we added for creating URI.
Please let me know whether we can handle this usecase in another PR as the 
change require again a good amount of testing effort. currently this PR 
addresses all basic issues related to wildcard both in case of local/hdfs  
file/folder paths.
Let me know for any sugegstions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21386: [SPARK-23928][SQL][WIP] Add shuffle collection function.

2018-06-08 Thread pkuwm
Github user pkuwm commented on the issue:

https://github.com/apache/spark/pull/21386
  
I learned more of the code and am polishing my second commit. I had 
something else to do and also attended the spark summit this week.  Sorry for 
being late. Will submit a new commit over the weekend. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91588/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91588 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91588/testReport)**
 for PR 21366 at commit 
[`c1b8431`](https://github.com/apache/spark/commit/c1b8431524474ae710e2733836d86e1ee94df02f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91583/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91583/testReport)**
 for PR 21366 at commit 
[`03b1064`](https://github.com/apache/spark/commit/03b1064b7a2bb133a02e4616b9ed009ab36c803e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91582/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91582/testReport)**
 for PR 21366 at commit 
[`e42dd4f`](https://github.com/apache/spark/commit/e42dd4f77115de6bc25d00aaa4bae239a098802c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaPowerIterationClusteringExample `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21504
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91585/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21504
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21504
  
**[Test build #91585 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91585/testReport)**
 for PR 21504 at commit 
[`421e16b`](https://github.com/apache/spark/commit/421e16b20f63f8df7f279bf2dcea76a060a85ad3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21405
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3874/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21405
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21499: [SPARK-24468][SQL] Handle negative scale when adj...

2018-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21499


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21499: [SPARK-24468][SQL] Handle negative scale when adjusting ...

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21499
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21045
  
There are function doctests such as `corr` using Python built-in `zip` so 
it conflicts with this function and causes the test failure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21514: [SPARK-22860] [Core] - hide key password from linux ps l...

2018-06-08 Thread tooptoop4
Github user tooptoop4 commented on the issue:

https://github.com/apache/spark/pull/21514
  
@dbtsai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21398
  
I don't think working around a bug in Sentry is a good reason to add a hack 
in Spark. `HiveExternalCatalog` is already full of hacks...

If you have a proposal to clean that up and can also work around the Sentry 
bug, that will be great.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r194210104
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -128,6 +128,175 @@ case class MapKeys(child: Expression)
   override def prettyName: String = "map_keys"
 }
 
+@ExpressionDescription(
+  usage = """
+_FUNC_(a1, a2, ...) - Returns a merged array containing in the N-th 
position the
+N-th value of each array given.
--- End diff --

This description looks a bit confusing to me. How about `Returns a merged 
array of structs in which the N-th struct contains all N-th values of input 
arrays.`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20844: [SPARK-23707][SQL] Don't need shuffle exchange with sing...

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20844
  
I think it has been fixed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20100: [SPARK-22913][SQL] Improved Hive Partition Pruning

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20100
  
Spark doesn't officially support Glue, I think Glue is plugged into Spark 
by pretending itself as a certain hive version, and that hive version should 
support timestamp and fraction.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19364
  
**[Test build #91592 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91592/testReport)**
 for PR 19364 at commit 
[`f0de287`](https://github.com/apache/spark/commit/f0de287a503ed89f27678a25001e7f9514ef3888).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r194209974
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2394,6 +2394,23 @@ def array_repeat(col, count):
 return Column(sc._jvm.functions.array_repeat(_to_java_column(col), 
count))
 
 
+@since(2.4)
+def zip(*cols):
+"""
+Collection function: Returns a merged array containing in the N-th 
position the
+N-th value of each array given.
+
+:param cols: columns in input
--- End diff --

nit: columns of arrays to be merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19364
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r194209926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -128,6 +128,175 @@ case class MapKeys(child: Expression)
   override def prettyName: String = "map_keys"
 }
 
+@ExpressionDescription(
+  usage = """
+_FUNC_(a1, a2, ...) - Returns a merged array containing in the N-th 
position the
+N-th value of each array given.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4));
+[[1, 2], [2, 3], [3, 4]]
+  > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4));
+[[1, 2, 3], [2, 3, 4]]
+  """,
+  since = "2.4.0")
+case class Zip(children: Seq[Expression]) extends Expression with 
ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.length)(ArrayType)
+
+  override def dataType: DataType = ArrayType(mountSchema)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  private lazy val arrayTypes = 
children.map(_.dataType.asInstanceOf[ArrayType])
+
+  private lazy val arrayElementTypes = arrayTypes.map(_.elementType)
+
+  @transient private lazy val mountSchema: StructType = {
+val fields = children.zip(arrayElementTypes).zipWithIndex.map {
+  case ((expr: NamedExpression, elementType), _) =>
+StructField(expr.name, elementType, nullable = true)
+  case ((_, elementType), idx) =>
+StructField(idx.toString, elementType, nullable = true)
+}
+StructType(fields)
+  }
+
+  @transient lazy val numberOfArrays: Int = children.length
+
+  @transient lazy val genericArrayData = classOf[GenericArrayData].getName
+
+  def emptyInputGenCode(ev: ExprCode): ExprCode = {
+ev.copy(code"""
+  |${CodeGenerator.javaType(dataType)} ${ev.value} = new 
$genericArrayData(new Object[0]);
+  |boolean ${ev.isNull} = false;
+""".stripMargin)
+  }
+
+  def nonEmptyInputGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val genericInternalRow = classOf[GenericInternalRow].getName
+val arrVals = ctx.freshName("arrVals")
+val arrCardinality = ctx.freshName("arrCardinality")
+val biggestCardinality = ctx.freshName("biggestCardinality")
+
+val currentRow = ctx.freshName("currentRow")
+val j = ctx.freshName("j")
+val i = ctx.freshName("i")
+val args = ctx.freshName("args")
+
+val evals = children.map(_.genCode(ctx))
+val getValuesAndCardinalities = evals.zipWithIndex.map { case (eval, 
index) =>
+  s"""
+|if ($biggestCardinality != -1) {
+|  ${eval.code}
+|  if (!${eval.isNull}) {
+|$arrVals[$index] = ${eval.value};
+|$arrCardinality[$index] = ${eval.value}.numElements();
+|$biggestCardinality = Math.max($biggestCardinality, 
$arrCardinality[$index]);
+|  } else {
+|$biggestCardinality = -1;
+|  }
+|}
+  """.stripMargin
+}
+
+val splittedGetValuesAndCardinalities = ctx.splitExpressions(
+  expressions = getValuesAndCardinalities,
+  funcName = "getValuesAndCardinalities",
+  returnType = "int",
+  makeSplitFunction = body =>
+s"""
+  |$body
+  |return $biggestCardinality;
+""".stripMargin,
+  foldFunctions = _.map(funcCall => s"$biggestCardinality = 
$funcCall;").mkString("\n"),
+  arguments =
+("ArrayData[]", arrVals) ::
+("int[]", arrCardinality) ::
+("int", biggestCardinality) :: Nil)
+
+val getValueForType = arrayElementTypes.zipWithIndex.map { case 
(eleType, idx) =>
+  val g = CodeGenerator.getValue(s"$arrVals[$idx]", eleType, i)
+  s"""
+|if ($i < $arrCardinality[$idx] && !$arrVals[$idx].isNullAt($i)) {
--- End diff --

Looks like `arrCardinality` is only used here. We can write 
`$arrVals[$idx].numElements()` and remove `arrCardinality`. So it doesn't need 
to pass `arrCardinality` to two `splitExpressions` here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21427
  
if adding the config is trivial, let's add it. We can pick the new behavior 
by default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21517: Testing k8s change - please ignore (13)

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21517
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3728/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21517: Testing k8s change - please ignore (13)

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21517
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3728/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21517: Testing k8s change - please ignore (13)

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21517
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21517: Testing k8s change - please ignore (13)

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3873/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-06-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r194208465
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2394,6 +2394,23 @@ def array_repeat(col, count):
 return Column(sc._jvm.functions.array_repeat(_to_java_column(col), 
count))
 
 
+@since(2.4)
+def zip(*cols):
+"""
+Collection function: Merge two columns into one, such that the M-th 
element of the N-th
+argument will be the N-th field of the M-th output element.
+
+:param cols: columns in input
+
+>>> from pyspark.sql.functions import zip as spark_zip
--- End diff --

Let's use `arrays_zip`.
As for `min`, `max`, we use `array_min` and `array_max` for array type 
functions apart from aggregate functions. I think `arrays_zip` is consistent 
with the names.

@DylanGuedes Would you mind if I ask to change to `arrays_zip`? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and 2.12.6

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21495
  
**[Test build #91580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91580/testReport)**
 for PR 21495 at commit 
[`4c852fa`](https://github.com/apache/spark/commit/4c852fa6086bc17871e3fa742f77620e51f809f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-08 Thread aalobaidi
Github user aalobaidi commented on the issue:

https://github.com/apache/spark/pull/21500
  
I can confirm that snapshots are still being built normally with no issue. 

@HeartSaVioR not sure why executor must load at least 1 version of state in 
memory. Could you elaborate? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21427
  
I would prefer if we could do this without a config because while the 
current behavior can work if the user knows what they are doing, it can also 
fail very easily and not obviously.  So to me that seems like a bug and we 
should just fix it so the feature can not be used in a potentially dangerous 
way.

If we need to make a config though, can it be such that it falls back to 
the current behavior (to use position) only in there is a `KeyError` and the 
switch is set to be backwards compatible?  Otherwise it would raise the 
`KeyError`..  If we did this, then (1) and (2) from 
https://github.com/apache/spark/pull/21427#issuecomment-392070950 could 
continue to work but the following would no longer work (this seems pretty 
silly though):
```
@pandas_udf("a string, b float", GROUPED_MAP)
def foo(pdf):
return pd.DataFrame({'b': ['hi'], 'a': [1.0]})
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite

2018-06-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21518
  
**[Test build #91591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91591/testReport)**
 for PR 21518 at commit 
[`3f59ca2`](https://github.com/apache/spark/commit/3f59ca2ebafae592be354fa75f28dfea63d2e568).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14653
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14129: [SPARK-16280][SQL] Implement histogram_numeric SQL funct...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14129
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14291: [SPARK-16658][GRAPHX] Add EdgePartition.withVertexAttrib...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14291
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Support for deploying Anaconda an...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14180
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15899
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14325: [SPARK-16692] [ML] Add multi label classification evalua...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14325
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14940: [SPARK-17383][GRAPHX] Improvement LabelPropagaton, and r...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14940
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15622: [SPARK-18092][ML] Fix column prediction type error

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15622
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15496: [SPARK-17950] [Python] Match SparseVector behavior with ...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15496
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13627: [SPARK-15906][MLlib] Add complementary naive bayes algor...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13627
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16556: [SPARK-19184][MLlib] Improve numerical stability for met...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16556
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16006
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16374: [SPARK-18925][STREAMING] Reduce memory usage of mapWithS...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16374
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15591: [SPARK-17922] [SQL] ClassCastException ..GeneratedClass$...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15591
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15670: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15670
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16415: [SPARK-19063][ML]Speedup and optimize the GradientBooste...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16415
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16411: [SPARK-17984][YARN][Mesos][Deploy][WIP] add executor lau...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16411
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16476
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16732: [SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatrix() op...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16732
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17000
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17169: [SPARK-19714][ML] Bucketizer.handleInvalid docs improved

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17169
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17234: [SPARK-19892][MLlib] Implement findAnalogies method for ...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17234
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17035: [SPARK-19705][SQL] Preferred location supporting HDFS ca...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17035
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16812
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17185
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17332: [SPARK-10764][ML] Add optional caching to Pipelines

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17332
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17422: [SPARK-20087] Attach accumulators / metrics to 'TaskKill...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17422
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17365: [SPARK-19962] [MLlib] add DictVectorizer to ml.feature

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17365
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17190: [SPARK-19478][SS] JDBC Sink

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17190
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17648
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17619: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17619
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17174: [SPARK-19145][SQL] Timestamp to String casting is slowin...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17174
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17462: [SPARK-20050][DStream] Kafka 0.10 DirectStream doesn't c...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17462
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17123
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17631: [SPARK-20319][SQL] Already quoted identifiers are gettin...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17631
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17461: [SPARK-20082][ml] LDA incremental model learning

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17461
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >