date:20180427

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89949/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #89949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89949/testReport)**
 for PR 21028 at commit 
[`5925104`](https://github.com/apache/spark/commit/592510461622cd8eccd6f93af2e1fdbc0521fb98).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r184841269
  
--- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
@@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason {
  * Task was killed intentionally and needs to be rescheduled.
  */
 @DeveloperApi
-case class TaskKilled(reason: String) extends TaskFailedReason {
+case class TaskKilled(
+reason: String,
+accumUpdates: Seq[AccumulableInfo] = Seq.empty,
+private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil)
--- End diff --

let's clean up ExceptionFailure at the same time, and use only 
`AccumulatorV2` in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...

2018-04-27 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/21178#discussion_r184841250
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
 ---
@@ -362,4 +371,34 @@ public static void verifyProxyAccess(String realUser, 
String proxyUser, String i
 }
   }
 
+  public static boolean needUgiLogin(UserGroupInformation ugi, String 
principal, String keytab) {
+return null == ugi || !ugi.hasKerberosCredentials() || 
!ugi.getUserName().equals(principal) ||
+  !keytab.equals(getKeytabFromUgi());
+  }
+
+  private static String getKeytabFromUgi() {
+Class clz = UserGroupInformation.class;
+try {
+  synchronized (clz) {
+Field field = clz.getDeclaredField("keytabFile");
+field.setAccessible(true);
+return (String) field.get(null);
+  }
+} catch (NoSuchFieldException e) {
+  try {
+synchronized (clz) {
+  // In Hadoop 3 we don't have "keytabFile" field, instead we 
should use private method
+  // getKeytab().
+  Method method = clz.getDeclaredMethod("getKeytab");
+  method.setAccessible(true);
+  return (String) 
method.invoke(UserGroupInformation.getCurrentUser());
--- End diff --

It is called twice right now; but as a util method which can be used in 
other place, let us not intentionally introduce known inefficiencies.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-04-27 Thread advancedxy

Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r184840246
  
--- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
@@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason {
  * Task was killed intentionally and needs to be rescheduled.
  */
 @DeveloperApi
-case class TaskKilled(reason: String) extends TaskFailedReason {
+case class TaskKilled(
+reason: String,
+accumUpdates: Seq[AccumulableInfo] = Seq.empty,
+private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil)
--- End diff --

Yeah, I noticed `accumUpdates: Seq[AccumulableInfo]` is only used in 
JsonProtocol. Is that for a reason?

The current impl  is constructed to be sync with existing TaskEndReason 
such as `ExceptionFailure`
```
@DeveloperApi
case class ExceptionFailure(
className: String,
description: String,
stackTrace: Array[StackTraceElement],
fullStackTrace: String,
private val exceptionWrapper: Option[ThrowableSerializationWrapper],
accumUpdates: Seq[AccumulableInfo] = Seq.empty,
private[spark] var accums: Seq[AccumulatorV2[_, _]] = Nil)
```

I'd prefer to keep in sync, leave two options for cleanup:
1. leave it as it is, then cleanup with ExceptionFailure together
2. Cleanup ExceptionFailure first.

@cloud-fan what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89948/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89948/testReport)**
 for PR 21021 at commit 
[`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ArraySortUtil extends ExpectsInputTypes `
  * `case class ArraySort(child: Expression) extends UnaryExpression with 
ArraySortUtil `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21119#discussion_r184839152
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
+   JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
+is a symmetric matrix whose entries are non-negative similarities 
between items.
+PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
+includes:
+
+ - :py:class:`idCol`: vertex ID
+ - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol`
+ - :py:class:`similaritiesCol`: non-negative weights (similarities) of 
edges between the
+vertex in :py:class:`idCol` and each neighbor in 
:py:class:`neighborsCol`
+
+PIC returns a cluster assignment for each input vertex.  It appends a 
new column
+:py:class:`predictionCol` containing the cluster assignment in 
:py:class:`[0,k)` for
+each row (vertex).
+
+Notes:
+
+ - [[PowerIterationClustering]] is a transformer with an expensive 
[[transform]] operation.
+Transform runs the iterative PIC algorithm to cluster the whole 
input dataset.
+ - Input validation: This validates that similarities are non-negative 
but does NOT validate
+that the input matrix is symmetric.
+
+@see http://en.wikipedia.org/wiki/Spectral_clustering>
--- End diff --

Use `.. seealso::`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21119#discussion_r184839158
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
+   JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
+is a symmetric matrix whose entries are non-negative similarities 
between items.
+PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
+includes:
+
+ - :py:class:`idCol`: vertex ID
+ - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol`
+ - :py:class:`similaritiesCol`: non-negative weights (similarities) of 
edges between the
+vertex in :py:class:`idCol` and each neighbor in 
:py:class:`neighborsCol`
+
+PIC returns a cluster assignment for each input vertex.  It appends a 
new column
+:py:class:`predictionCol` containing the cluster assignment in 
:py:class:`[0,k)` for
+each row (vertex).
+
+Notes:
--- End diff --

Use `.. note::`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21119#discussion_r184839128
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
+   JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
+is a symmetric matrix whose entries are non-negative similarities 
between items.
+PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
+includes:
+
+ - :py:class:`idCol`: vertex ID
--- End diff --

```:py:attr:`idCol` ```? And also the below ```:py:class:`neighborsCol` 
```, etc...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21119#discussion_r184838981
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
+   JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
+is a symmetric matrix whose entries are non-negative similarities 
between items.
+PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
+includes:
+
+ - :py:class:`idCol`: vertex ID
+ - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol`
+ - :py:class:`similaritiesCol`: non-negative weights (similarities) of 
edges between the
+vertex in :py:class:`idCol` and each neighbor in 
:py:class:`neighborsCol`
+
+PIC returns a cluster assignment for each input vertex.  It appends a 
new column
+:py:class:`predictionCol` containing the cluster assignment in 
:py:class:`[0,k)` for
+each row (vertex).
+
+Notes:
+
+ - [[PowerIterationClustering]] is a transformer with an expensive 
[[transform]] operation.
+Transform runs the iterative PIC algorithm to cluster the whole 
input dataset.
+ - Input validation: This validates that similarities are non-negative 
but does NOT validate
+that the input matrix is symmetric.
+
+@see http://en.wikipedia.org/wiki/Spectral_clustering>
+Spectral clustering (Wikipedia)
+
+>>> from pyspark.sql.types import ArrayType, DoubleType, LongType, 
StructField, StructType
+>>> similarities = [((long)(1), [0], [0.5]), ((long)(2), [0, 1], 
[0.7,0.5]), \
+((long)(3), [0, 1, 2], [0.9, 0.7, 0.5]), \
+((long)(4), [0, 1, 2, 3], [1.1, 0.9, 0.7,0.5]), \
+((long)(5), [0, 1, 2, 3, 4], [1.3, 1.1, 0.9, 
0.7,0.5])]
+>>> rdd = sc.parallelize(similarities, 2)
+>>> schema = StructType([StructField("id", LongType(), False), \
+ StructField("neighbors", ArrayType(LongType(), False), 
True), \
+ StructField("similarities", ArrayType(DoubleType(), 
False), True)])
+>>> df = spark.createDataFrame(rdd, schema)
+>>> pic = PowerIterationClustering()
+>>> result = pic.setK(2).setMaxIter(10).transform(df)
+>>> predictions = sorted(set([(i[0], i[1]) for i in 
result.select(result.id, result.prediction)
+... .collect()]), key=lambda x: x[0])
+>>> predictions[0]
+(1, 1)
+>>> predictions[1]
+(2, 1)
+>>> predictions[2]
+(3, 0)
+>>> predictions[3]
+(4, 0)
+>>> predictions[4]
+(5, 0)
+>>> pic_path = temp_path + "/pic"
+>>> pic.save(pic_path)
+>>> pic2 = PowerIterationClustering.load(pic_path)
+>>> pic2.getK()
+2
+>>> pic2.getMaxIter()
+10
+>>> pic3 = PowerIterationClustering(k=4, initMode="degree")
+>>> pic3.getIdCol()
+'id'
+>>> pic3.getK()
+4
+>>> pic3.getMaxIter()
+20
+>>> pic3.getInitMode()
+'degree'
+
+.. versionadded:: 2.4.0
+"""
+
+k = Param(Params._dummy(), "k",
+  "The number of clusters to create. Must be > 1.",
+  typeConverter=TypeConverters.toInt)
+initMode = Param(Params._dummy(), "initMode",
+ "The initialization algorithm. This can be either " +
+ "'random' to use a random vector as vertex 
properties, or 'degree' to use " +
+ "a normalized sum of similarities with other 
vertices.  Supported options: " +
+ "'random' and 'degree'.",
+ typeConverter=TypeConverters.toString)
+idCol = Param(Params._dummy(), "idCol",
+  "Name of the input column for vertex IDs.",
+  typeConverter=TypeConverters.toString)
+neighborsCol = Param(Params._dummy(), "neighborsCol",
+ "Name of the input column for neighbors in the 
adjacency list " +
+

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21119#discussion_r184838934
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
 return self.getOrDefault(self.keepLastCheckpoint)
 
 
+@inherit_doc
+class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
+   JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
+http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From 
the abstract:
+PIC finds a very low-dimensional embedding of a dataset using 
truncated power
+iteration on a normalized pair-wise similarity matrix of the data.
+
+PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
+is a symmetric matrix whose entries are non-negative similarities 
between items.
+PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
+includes:
+
+ - :py:class:`idCol`: vertex ID
+ - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol`
+ - :py:class:`similaritiesCol`: non-negative weights (similarities) of 
edges between the
+vertex in :py:class:`idCol` and each neighbor in 
:py:class:`neighborsCol`
+
+PIC returns a cluster assignment for each input vertex.  It appends a 
new column
+:py:class:`predictionCol` containing the cluster assignment in 
:py:class:`[0,k)` for
+each row (vertex).
+
+Notes:
+
+ - [[PowerIterationClustering]] is a transformer with an expensive 
[[transform]] operation.
+Transform runs the iterative PIC algorithm to cluster the whole 
input dataset.
+ - Input validation: This validates that similarities are non-negative 
but does NOT validate
+that the input matrix is symmetric.
+
+@see http://en.wikipedia.org/wiki/Spectral_clustering>
+Spectral clustering (Wikipedia)
+
+>>> from pyspark.sql.types import ArrayType, DoubleType, LongType, 
StructField, StructType
+>>> similarities = [((long)(1), [0], [0.5]), ((long)(2), [0, 1], 
[0.7,0.5]), \
+((long)(3), [0, 1, 2], [0.9, 0.7, 0.5]), \
+((long)(4), [0, 1, 2, 3], [1.1, 0.9, 0.7,0.5]), \
+((long)(5), [0, 1, 2, 3, 4], [1.3, 1.1, 0.9, 
0.7,0.5])]
+>>> rdd = sc.parallelize(similarities, 2)
+>>> schema = StructType([StructField("id", LongType(), False), \
+ StructField("neighbors", ArrayType(LongType(), False), 
True), \
+ StructField("similarities", ArrayType(DoubleType(), 
False), True)])
+>>> df = spark.createDataFrame(rdd, schema)
+>>> pic = PowerIterationClustering()
+>>> result = pic.setK(2).setMaxIter(10).transform(df)
+>>> predictions = sorted(set([(i[0], i[1]) for i in 
result.select(result.id, result.prediction)
+... .collect()]), key=lambda x: x[0])
+>>> predictions[0]
+(1, 1)
+>>> predictions[1]
+(2, 1)
+>>> predictions[2]
+(3, 0)
+>>> predictions[3]
+(4, 0)
+>>> predictions[4]
+(5, 0)
+>>> pic_path = temp_path + "/pic"
+>>> pic.save(pic_path)
+>>> pic2 = PowerIterationClustering.load(pic_path)
+>>> pic2.getK()
+2
+>>> pic2.getMaxIter()
+10
+>>> pic3 = PowerIterationClustering(k=4, initMode="degree")
+>>> pic3.getIdCol()
+'id'
+>>> pic3.getK()
+4
+>>> pic3.getMaxIter()
+20
+>>> pic3.getInitMode()
+'degree'
+
+.. versionadded:: 2.4.0
+"""
+
+k = Param(Params._dummy(), "k",
+  "The number of clusters to create. Must be > 1.",
+  typeConverter=TypeConverters.toInt)
+initMode = Param(Params._dummy(), "initMode",
+ "The initialization algorithm. This can be either " +
+ "'random' to use a random vector as vertex 
properties, or 'degree' to use " +
+ "a normalized sum of similarities with other 
vertices.  Supported options: " +
+ "'random' and 'degree'.",
+ typeConverter=TypeConverters.toString)
+idCol = Param(Params._dummy(), "idCol",
+  "Name of the input column for vertex IDs.",
+  typeConverter=TypeConverters.toString)
+neighborsCol = Param(Params._dummy(), "neighborsCol",
+ "Name of the input column for neighbors in the 
adjacency list " +
+

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21119#discussion_r184838848
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -97,13 +97,15 @@ private[clustering] trait 
PowerIterationClusteringParams extends Params with Has
   def getNeighborsCol: String = $(neighborsCol)
 
   /**
-   * Param for the name of the input column for neighbors in the adjacency 
list representation.
+   * Param for the name of the input column for non-negative weights 
(similarities) of edges
+   * between the vertex in `idCol` and each neighbor in `neighborsCol`.
--- End diff --

Good catch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21185
  
I think `SparkSession` is driver only, how do we access it in executor?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r184838512
  
--- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
@@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason {
  * Task was killed intentionally and needs to be rescheduled.
  */
 @DeveloperApi
-case class TaskKilled(reason: String) extends TaskFailedReason {
+case class TaskKilled(
+reason: String,
+accumUpdates: Seq[AccumulableInfo] = Seq.empty,
+private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil)
--- End diff --

Previously we use `AccumulableInfo` to expose accumulator information to 
end users. Now `AccumulatorV2` is already a public classs and we don't need to 
do it anymore, I think we can just do
```
case class TaskKilled(reason: String, accums: Seq[AccumulatorV2[_, _]])
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184838245
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ---
@@ -279,4 +282,11 @@ class ApproximatePercentileQuerySuite extends 
QueryTest with SharedSQLContext {
   checkAnswer(query, expected)
 }
   }
+
+  test("SPARK-24013: unneeded compress can cause performance issues with 
sorted input") {
+failAfter(30 seconds) {
--- End diff --

this test looks pretty weird. Can we add some kind of unit test and move 
this test to PR description and say the perf has improved a lot after this 
patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21122
  
To add some more color, the current `ExternalCatalog` is an abstract class 
and can already be implemented outside of Spark. However the problem is the 
listeners. Everytime we want to listen to one more event, we need to break the 
API(`createDatabase` and `doCreateDatabase`). this is very bad for a stable 
interface.

The main goal of this PR is to pull out the listener stuff from 
`ExternalCatalog`, and make `ExternalCatalog` a pure interface.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21109: [SPARK-24020][SQL] Sort-merge join inner range op...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21109#discussion_r184837526
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala
 ---
@@ -222,6 +222,61 @@ class JoinBenchmark extends BenchmarkBase {
  */
   }
 
+  val expensiveFunc = (first: Int, second: Int) => {
+for (i <- 1 to 2000) {
+  Math.sqrt(i * i * i)
+}
+Math.abs(first - second)
+  }
+
+  def innerRangeTest(N: Int, M: Int): Unit = {
+import sparkSession.implicits._
+val expUdf = sparkSession.udf.register("expensiveFunc", expensiveFunc)
+val df1 = sparkSession.sparkContext.parallelize(1 to M).
+  cartesian(sparkSession.sparkContext.parallelize(1 to N)).
+  toDF("col1a", "col1b")
+val df2 = sparkSession.sparkContext.parallelize(1 to M).
+  cartesian(sparkSession.sparkContext.parallelize(1 to N)).
+  toDF("col2a", "col2b")
+val df = df1.join(df2, 'col1a === 'col2a and ('col1b < 'col2b + 3) and 
('col1b > 'col2b - 3))
+
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
+df.where(expUdf('col1b, 'col2b) < 3).count()
+  }
+
+  ignore("sort merge inner range join") {
+sparkSession.conf.set("spark.sql.join.smj.useInnerRangeOptimization", 
"false")
+val N = 2 << 5
+val M = 100
+runBenchmark("sort merge inner range join", N * M) {
+  innerRangeTest(N, M)
+}
+
+/*
+ *AMD EPYC 7401 24-Core Processor
+ *sort merge join:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative
+ 
*-
+ *sort merge join wholestage off13822 / 14068  0.0 
2159662.3   1.0X
+ *sort merge join wholestage on   3863 / 4226  0.0 
 603547.0   3.6X
+ */
+  }
+
+  ignore("sort merge inner range join optimized") {
+sparkSession.conf.set("spark.sql.join.smj.useInnerRangeOptimization", 
"true")
+val N = 2 << 5
+val M = 100
+runBenchmark("sort merge inner range join optimized", N * M) {
+  innerRangeTest(N, M)
+}
+
+/*
+ *AMD EPYC 7401 24-Core Processor
+ *sort merge join:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative
+ 
*-
+ *sort merge join wholestage off12723 / 12800  0.0 
1988008.4   1.0X
--- End diff --

Why wholestage-off case doesn't get much improvement as wholestage-on case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...

2018-04-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21109
  
Thanks for working on this.

Based on the description on JIRA, I think the main cause of the bad 
performance is re-calculation an expensive function on matches rows. With the 
added benchmark, I adjust the order of conditions so the expensive UDF is put 
at the end of predicate. Below is the results. The first one is original 
benchmark. The second is the one with UDF at the end of predicate.


```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 
4.9.87-linuxkit-aufs
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
sort merge inner range join: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


sort merge inner range join wholestage off  6913 / 6964  0.0
 1080112.4   1.0X
sort merge inner range join wholestage on  2094 / 2224  0.0 
 327217.4   3.3X
```

```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 
4.9.87-linuxkit-aufs
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
sort merge inner range join: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


sort merge inner range join wholestage off   675 /  704  0.0
  105493.9   1.0X
sort merge inner range join wholestage on   374 /  398  0.0 
  58359.6   1.8X
```

It can be easily improved because short-circuit evaluation of predicate. 
This can be applied to also other conditions other than just range comparison. 
So I'm thinking if we need a way to give a hint to Spark to adjust the order of 
expression for an expensive one like UDF.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-27 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21165
  
@jiangxb1987 @cloud-fan I think it's ready for review. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should...

2018-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21181


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should contai...

2018-04-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21181
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r184837304
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -19,14 +19,41 @@ package org.apache.spark.sql.catalyst.expressions
 import java.util.Comparator
 
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, 
TypeCoercion}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, 
MapData, TypeUtils}
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.Platform
 import org.apache.spark.unsafe.array.ByteArrayMethods
 import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
 
+/**
+ * Base trait for [[BinaryExpression]]s with two arrays of the same 
element type and implicit
+ * casting.
+ */
+trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression
--- End diff --

As @ueshin pointed out 
[here](https://github.com/apache/spark/pull/21028#discussion_r184266872), 
`concat` is also a use case that has a different number of children. Am I wrong?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21187
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21187
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2735/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21187
  
**[Test build #89950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89950/testReport)**
 for PR 21187 at commit 
[`c486c6b`](https://github.com/apache/spark/commit/c486c6b15de49a519c728d037a8979791ea37e74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

2018-04-27 Thread maryannxue

GitHub user maryannxue opened a pull request:

https://github.com/apache/spark/pull/21187

[SPARK-24035][SQL] SQL syntax for Pivot

## What changes were proposed in this pull request?

Add SQL support for Pivot according to Pivot grammar defined by Oracle 
(https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm) with 
some simplifications, based on our existing functionality and limitations for 
Pivot at the backend:
1. For pivot_for_clause 
(https://docs.oracle.com/database/121/SQLRF/img_text/pivot_for_clause.htm), the 
column list form is not supported, which means the pivot column can only be one 
single column.
2. For pivot_in_clause 
(https://docs.oracle.com/database/121/SQLRF/img_text/pivot_in_clause.htm), the 
sub-query form and "ANY" is not supported (this is only supported by Oracle for 
XML anyway).
3. For pivot_in_clause, aliases for the constant values are not supported.

The code changes are:
1. Add parser support for Pivot. Note that according to 
https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#i2076542, Pivot 
cannot be used together with lateral views in the from clause. This restriction 
has been implemented in the Parser rule.
2. Infer group-by expressions: group-by expressions are not explicitly 
specified in SQL Pivot clause and need to be deduced based on this rule: 
https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#CHDFAFIE, so we 
have to post-fix it at query analysis stage.
3. Override Pivot.resolved as "false": for the reason mentioned in [2] and 
the fact that output attributes change after Pivot being replaced by Project or 
Aggregate, we avoid resolving references until after Pivot has been resolved 
and replaced.
4. Verify aggregate expressions: only aggregate expressions with or without 
aliases can appear in the first part of the Pivot clause, and this check is 
performed as analysis stage.

## How was this patch tested?

A new test suite PivotSuite is added.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/spark spark-24035

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21187


commit c486c6b15de49a519c728d037a8979791ea37e74
Author: maryannxue 
Date:   2018-04-28T01:17:52Z

[SPARK-24035] SQL syntax for Pivot




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89945/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89945/testReport)**
 for PR 21088 at commit 
[`a7f35f4`](https://github.com/apache/spark/commit/a7f35f4c782d76b78d26688ec9a593d2bbbf3c39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2734/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #89949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89949/testReport)**
 for PR 21028 at commit 
[`5925104`](https://github.com/apache/spark/commit/592510461622cd8eccd6f93af2e1fdbc0521fb98).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21028
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source

2018-04-27 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21152
  
@jerryshao Thanks for merging! My Apache JIRA ID is âkabhwanâ


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...

2018-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21185#discussion_r184813348
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -299,6 +316,9 @@ private[spark] class Executor(
   Thread.currentThread.setContextClassLoader(replClassLoader)
   val ser = env.closureSerializer.newInstance()
   logInfo(s"Running $taskName (TID $taskId)")
+  // When running in local mode, we might end up with the active 
session from the driver set on
+  // this thread, though we never should, so we defensively clear it.  
See SPARK-23894.
+  clearActiveSparkSessionMethod.foreach(_.invoke(null))
--- End diff --

Can this be done in the thread pool's thread factory instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...

2018-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21185#discussion_r184813243
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -229,6 +229,23 @@ private[spark] class Executor(
 
ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum
   }
 
+  /**
+   * Only in local mode, we have to prevent the driver from setting the 
active SparkSession
+   * in the executor threads.  See SPARK-23894.
+   */
+  lazy val clearActiveSparkSessionMethod = if (Utils.isLocalMaster(conf)) {
--- End diff --

private?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21152: [SPARK-23688][SS] Refactor tests away from rate s...

2018-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21152


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source

2018-04-27 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21152
  
@HeartSaVioR what is your JIRA id?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source

2018-04-27 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21152
  
LGTM. Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-27 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r184835757
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -16,12 +16,14 @@
  */
 package org.apache.spark.sql.catalyst.expressions
 
+import java.util
 import java.util.Comparator
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, 
MapData, TypeUtils}
+import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
--- End diff --

How about merging these two lines into one line 
`org.apache.spark.sql.catalyst.util._`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21178
  
**[Test build #89947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89947/testReport)**
 for PR 21178 at commit 
[`77142c6`](https://github.com/apache/spark/commit/77142c6caf2bcc46defc19994613af76d872673b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89947/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2733/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21166: [SPARK-11334][CORE] clear idle executors in executorIdTo...

2018-04-27 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21166
  
1. We improve the DAGScheduler to always send TaskEnd message. So the issue 
I found before may not be valid.
2. We refactored the LiveListenerQueue to make it more robust for internal 
listener. We cannot guarantee that event will never be lost, but the chance is 
quite small (SPARK-18838).

IMHO you (as a PR submitter) should validate this issue with latest code 
and make sure you can reproduce it with latest code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89948/testReport)**
 for PR 21021 at commit 
[`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21021
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2732/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21178
  
**[Test build #89947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89947/testReport)**
 for PR 21178 at commit 
[`77142c6`](https://github.com/apache/spark/commit/77142c6caf2bcc46defc19994613af76d872673b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2731/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89946/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21119
  
**[Test build #89946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89946/testReport)**
 for PR 21119 at commit 
[`a6b1822`](https://github.com/apache/spark/commit/a6b18222b65e878e22ddf8f2d340aa3127c99e0c).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21119
  
**[Test build #89946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89946/testReport)**
 for PR 21119 at commit 
[`a6b1822`](https://github.com/apache/spark/commit/a6b18222b65e878e22ddf8f2d340aa3127c99e0c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21073
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89944/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21073
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #89944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89944/testReport)**
 for PR 21073 at commit 
[`2e49b1e`](https://github.com/apache/spark/commit/2e49b1e01ba10d7baba9196d64af8db1cd7b2dd1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...

2018-04-27 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21178#discussion_r184833705
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
 ---
@@ -362,4 +371,34 @@ public static void verifyProxyAccess(String realUser, 
String proxyUser, String i
 }
   }
 
+  public static boolean needUgiLogin(UserGroupInformation ugi, String 
principal, String keytab) {
+return null == ugi || !ugi.hasKerberosCredentials() || 
!ugi.getUserName().equals(principal) ||
+  !keytab.equals(getKeytabFromUgi());
+  }
+
+  private static String getKeytabFromUgi() {
+Class clz = UserGroupInformation.class;
+try {
+  synchronized (clz) {
+Field field = clz.getDeclaredField("keytabFile");
+field.setAccessible(true);
+return (String) field.get(null);
+  }
+} catch (NoSuchFieldException e) {
+  try {
+synchronized (clz) {
+  // In Hadoop 3 we don't have "keytabFile" field, instead we 
should use private method
+  // getKeytab().
+  Method method = clz.getDeclaredMethod("getKeytab");
+  method.setAccessible(true);
+  return (String) 
method.invoke(UserGroupInformation.getCurrentUser());
--- End diff --

This will only be called twice in the initialization stage, so there should 
not be large overhead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...

2018-04-27 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21178#discussion_r184833443
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
 ---
@@ -52,8 +52,22 @@ private[hive] class SparkSQLCLIService(hiveServer: 
HiveServer2, sqlContext: SQLC
 
 if (UserGroupInformation.isSecurityEnabled) {
   try {
-HiveAuthFactory.loginFromKeytab(hiveConf)
-sparkServiceUGI = Utils.getUGI()
+val principal = 
hiveConf.getVar(ConfVars.HIVE_SERVER2_KERBEROS_PRINCIPAL)
+val keyTabFile = 
hiveConf.getVar(ConfVars.HIVE_SERVER2_KERBEROS_KEYTAB)
+if (principal.isEmpty || keyTabFile.isEmpty) {
+  throw new IOException(
+"HiveServer2 Kerberos principal or keytab is not correctly 
configured")
+}
+
+val originalUgi = UserGroupInformation.getCurrentUser
--- End diff --

I don't think there's any particular reason, we just copy what HS2 did 
before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...

2018-04-27 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21178#discussion_r184833381
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
 ---
@@ -18,14 +18,11 @@
 package org.apache.hive.service.auth;
 
 import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.Method;
 import java.net.InetSocketAddress;
 import java.net.UnknownHostException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Locale;
-import java.util.Map;
+import java.util.*;
--- End diff --

This is automatically done by my intellij idea, will revert back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask...

2018-04-27 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/21145
  
I think `ReadTask` is fine. That name does not imply that you can use the 
object itself to read, but it does correctly show that it is one task in a 
larger operation. I think the name implies that it represents something to be 
read, which is correct, and it is reasonable to look at the API for that object 
to see how to read it. That can be clearly accomplished, so I don't think we 
need a different name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21185
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89939/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21185
  
**[Test build #89939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89939/testReport)**
 for PR 21185 at commit 
[`2a4944f`](https://github.com/apache/spark/commit/2a4944ffe5836408b80f9aa06e9b28e57aa16649).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21182
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89942/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21182
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21182
  
**[Test build #89942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89942/testReport)**
 for PR 21182 at commit 
[`8a8ff3f`](https://github.com/apache/spark/commit/8a8ff3f5bfdfaee7ec73e362cfa34261d199f407).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89940/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-04-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21173
  
oh, I'll update. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #89940 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89940/testReport)**
 for PR 20894 at commit 
[`1fffc16`](https://github.com/apache/spark/commit/1fffc1614c5028fcbaf88bb07b9e75d56646aec1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Reverse(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes `
  * `case class ArrayJoin(`
  * `case class ArrayPosition(left: Expression, right: Expression)`
  * `case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil `
  * `case class Concat(children: Seq[Expression]) extends Expression `
  * `case class Flatten(child: Expression) extends UnaryExpression `
  * `abstract class GetMapValueUtil extends BinaryExpression with 
ImplicitCastInputTypes `
  * `case class GetMapValue(child: Expression, key: Expression)`
  * `case class MonthsBetween(`
  * `trait QueryPlanConstraints extends ConstraintHelper `
  * `trait ConstraintHelper `
  * `case class CachedRDDBuilder(`
  * `case class InMemoryRelation(`
  * `case class WriteToContinuousDataSource(`
  * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, 
query: SparkPlan)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89941/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21186
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21186
  
**[Test build #89941 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89941/testReport)**
 for PR 21186 at commit 
[`5383299`](https://github.com/apache/spark/commit/5383299738877b76c46d603635520e77dad52fd9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...

2018-04-27 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21174
  
@gatorsmile @maropu Thank you very much !!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89938/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20937
  
**[Test build #89938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89938/testReport)**
 for PR 20937 at commit 
[`e0cebf4`](https://github.com/apache/spark/commit/e0cebf4aa8bdec4d27ad9cd8d4296ebbb8ed9269).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HasCollectSubModels(Params):`
  * `class Summarizer(object):`
  * `class SummaryBuilder(JavaWrapper):`
  * `class CrossValidator(Estimator, ValidatorParams, HasParallelism, 
HasCollectSubModels,`
  * `class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, 
HasCollectSubModels,`
  * `case class Reverse(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes `
  * `case class ArrayJoin(`
  * `case class ArrayMin(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes `
  * `case class ArrayMax(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes `
  * `case class ArrayPosition(left: Expression, right: Expression)`
  * `case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil `
  * `case class Concat(children: Seq[Expression]) extends Expression `
  * `case class Flatten(child: Expression) extends UnaryExpression `
  * `abstract class GetMapValueUtil extends BinaryExpression with 
ImplicitCastInputTypes `
  * `case class GetMapValue(child: Expression, key: Expression)`
  * `case class MonthsBetween(`
  * `trait QueryPlanConstraints extends ConstraintHelper `
  * `trait ConstraintHelper `
  * `class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] `
  * `  .doc(\"The class used to write checkpoint files atomically. This 
class must be a subclass \" +`
  * `case class CachedRDDBuilder(`
  * `case class InMemoryRelation(`
  * `trait CheckpointFileManager `
  * `  sealed trait RenameHelperMethods `
  * `  abstract class CancellableFSDataOutputStream(protected val 
underlyingStream: OutputStream)`
  * `  sealed class RenameBasedFSDataOutputStream(`
  * `class FileSystemBasedCheckpointFileManager(path: Path, hadoopConf: 
Configuration)`
  * `class FileContextBasedCheckpointFileManager(path: Path, hadoopConf: 
Configuration)`
  * `case class WriteToContinuousDataSource(`
  * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, 
query: SparkPlan)`
  * `abstract class MemoryStreamBase[A : Encoder](sqlContext: SQLContext) 
extends BaseStreamingSource `
  * `class ContinuousMemoryStream[A : Encoder](id: Int, sqlContext: 
SQLContext)`
  * `  case class GetRecord(offset: ContinuousMemoryStreamPartitionOffset)`
  * `class ContinuousMemoryStreamDataReaderFactory(`
  * `class ContinuousMemoryStreamDataReader(`
  * `case class ContinuousMemoryStreamOffset(partitionNums: Map[Int, Int])`
  * `case class ContinuousMemoryStreamPartitionOffset(partition: Int, 
numProcessed: Int)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89945/testReport)**
 for PR 21088 at commit 
[`a7f35f4`](https://github.com/apache/spark/commit/a7f35f4c782d76b78d26688ec9a593d2bbbf3c39).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21170: [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala toString...

2018-04-27 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21170
  
cc @zsxwing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the same ex...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21184
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89937/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the same ex...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21184
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the same ex...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21184
  
**[Test build #89937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89937/testReport)**
 for PR 21184 at commit 
[`d676b62`](https://github.com/apache/spark/commit/d676b6277a682894d409e314e64ece7857a97841).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21088: [SPARK-24003][CORE] Add support to provide spark....

2018-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21088#discussion_r184814790
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -914,7 +916,9 @@ private[spark] class Client(
 s"(was '$opts'). Use spark.yarn.am.memory instead."
   throw new SparkException(msg)
 }
-javaOpts ++= 
Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell)
+javaOpts ++= Utils.splitCommandString(opts)
+.map(Utils.substituteAppId(_, appId.toString))
--- End diff --

nit: indentation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21041: [SPARK-23962][SQL][TEST] Fix race in currentExecutionIds...

2018-04-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21041
  
Thank you, @squito !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #89944 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89944/testReport)**
 for PR 21073 at commit 
[`2e49b1e`](https://github.com/apache/spark/commit/2e49b1e01ba10d7baba9196d64af8db1cd7b2dd1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21119
  
**[Test build #89943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89943/testReport)**
 for PR 21119 at commit 
[`6d00f34`](https://github.com/apache/spark/commit/6d00f343f5c78fbe290793fe85cbc3deed53cf3e).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89943/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21119
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2730/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21119
  
**[Test build #89943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89943/testReport)**
 for PR 21119 at commit 
[`6d00f34`](https://github.com/apache/spark/commit/6d00f343f5c78fbe290793fe85cbc3deed53cf3e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21182
  
**[Test build #89942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89942/testReport)**
 for PR 21182 at commit 
[`8a8ff3f`](https://github.com/apache/spark/commit/8a8ff3f5bfdfaee7ec73e362cfa34261d199f407).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21182
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 369 matches

Mail list logo