[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21028 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89949/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21028 **[Test build #89949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89949/testReport)** for PR 21028 at commit [`5925104`](https://github.com/apache/spark/commit/592510461622cd8eccd6f93af2e1fdbc0521fb98). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184841269 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason { * Task was killed intentionally and needs to be rescheduled. */ @DeveloperApi -case class TaskKilled(reason: String) extends TaskFailedReason { +case class TaskKilled( +reason: String, +accumUpdates: Seq[AccumulableInfo] = Seq.empty, +private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil) --- End diff -- let's clean up ExceptionFailure at the same time, and use only `AccumulatorV2` in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/21178#discussion_r184841250 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java --- @@ -362,4 +371,34 @@ public static void verifyProxyAccess(String realUser, String proxyUser, String i } } + public static boolean needUgiLogin(UserGroupInformation ugi, String principal, String keytab) { +return null == ugi || !ugi.hasKerberosCredentials() || !ugi.getUserName().equals(principal) || + !keytab.equals(getKeytabFromUgi()); + } + + private static String getKeytabFromUgi() { +Class clz = UserGroupInformation.class; +try { + synchronized (clz) { +Field field = clz.getDeclaredField("keytabFile"); +field.setAccessible(true); +return (String) field.get(null); + } +} catch (NoSuchFieldException e) { + try { +synchronized (clz) { + // In Hadoop 3 we don't have "keytabFile" field, instead we should use private method + // getKeytab(). + Method method = clz.getDeclaredMethod("getKeytab"); + method.setAccessible(true); + return (String) method.invoke(UserGroupInformation.getCurrentUser()); --- End diff -- It is called twice right now; but as a util method which can be used in other place, let us not intentionally introduce known inefficiencies. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184840246 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason { * Task was killed intentionally and needs to be rescheduled. */ @DeveloperApi -case class TaskKilled(reason: String) extends TaskFailedReason { +case class TaskKilled( +reason: String, +accumUpdates: Seq[AccumulableInfo] = Seq.empty, +private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil) --- End diff -- Yeah, I noticed `accumUpdates: Seq[AccumulableInfo]` is only used in JsonProtocol. Is that for a reason? The current impl is constructed to be sync with existing TaskEndReason such as `ExceptionFailure` ``` @DeveloperApi case class ExceptionFailure( className: String, description: String, stackTrace: Array[StackTraceElement], fullStackTrace: String, private val exceptionWrapper: Option[ThrowableSerializationWrapper], accumUpdates: Seq[AccumulableInfo] = Seq.empty, private[spark] var accums: Seq[AccumulatorV2[_, _]] = Nil) ``` I'd prefer to keep in sync, leave two options for cleanup: 1. leave it as it is, then cleanup with ExceptionFailure together 2. Cleanup ExceptionFailure first. @cloud-fan what do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89948/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21021 **[Test build #89948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89948/testReport)** for PR 21021 at commit [`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait ArraySortUtil extends ExpectsInputTypes ` * `case class ArraySort(child: Expression) extends UnaryExpression with ArraySortUtil ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184839152 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams, + JavaMLReadable, JavaMLWritable): +""" +.. note:: Experimental +Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by +http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From the abstract: +PIC finds a very low-dimensional embedding of a dataset using truncated power +iteration on a normalized pair-wise similarity matrix of the data. + +PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix +is a symmetric matrix whose entries are non-negative similarities between items. +PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row +includes: + + - :py:class:`idCol`: vertex ID + - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol` + - :py:class:`similaritiesCol`: non-negative weights (similarities) of edges between the +vertex in :py:class:`idCol` and each neighbor in :py:class:`neighborsCol` + +PIC returns a cluster assignment for each input vertex. It appends a new column +:py:class:`predictionCol` containing the cluster assignment in :py:class:`[0,k)` for +each row (vertex). + +Notes: + + - [[PowerIterationClustering]] is a transformer with an expensive [[transform]] operation. +Transform runs the iterative PIC algorithm to cluster the whole input dataset. + - Input validation: This validates that similarities are non-negative but does NOT validate +that the input matrix is symmetric. + +@see http://en.wikipedia.org/wiki/Spectral_clustering> --- End diff -- Use `.. seealso::`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184839158 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams, + JavaMLReadable, JavaMLWritable): +""" +.. note:: Experimental +Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by +http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From the abstract: +PIC finds a very low-dimensional embedding of a dataset using truncated power +iteration on a normalized pair-wise similarity matrix of the data. + +PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix +is a symmetric matrix whose entries are non-negative similarities between items. +PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row +includes: + + - :py:class:`idCol`: vertex ID + - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol` + - :py:class:`similaritiesCol`: non-negative weights (similarities) of edges between the +vertex in :py:class:`idCol` and each neighbor in :py:class:`neighborsCol` + +PIC returns a cluster assignment for each input vertex. It appends a new column +:py:class:`predictionCol` containing the cluster assignment in :py:class:`[0,k)` for +each row (vertex). + +Notes: --- End diff -- Use `.. note::`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184839128 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams, + JavaMLReadable, JavaMLWritable): +""" +.. note:: Experimental +Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by +http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From the abstract: +PIC finds a very low-dimensional embedding of a dataset using truncated power +iteration on a normalized pair-wise similarity matrix of the data. + +PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix +is a symmetric matrix whose entries are non-negative similarities between items. +PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row +includes: + + - :py:class:`idCol`: vertex ID --- End diff -- ```:py:attr:`idCol` ```? And also the below ```:py:class:`neighborsCol` ```, etc... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184838981 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams, + JavaMLReadable, JavaMLWritable): +""" +.. note:: Experimental +Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by +http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From the abstract: +PIC finds a very low-dimensional embedding of a dataset using truncated power +iteration on a normalized pair-wise similarity matrix of the data. + +PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix +is a symmetric matrix whose entries are non-negative similarities between items. +PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row +includes: + + - :py:class:`idCol`: vertex ID + - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol` + - :py:class:`similaritiesCol`: non-negative weights (similarities) of edges between the +vertex in :py:class:`idCol` and each neighbor in :py:class:`neighborsCol` + +PIC returns a cluster assignment for each input vertex. It appends a new column +:py:class:`predictionCol` containing the cluster assignment in :py:class:`[0,k)` for +each row (vertex). + +Notes: + + - [[PowerIterationClustering]] is a transformer with an expensive [[transform]] operation. +Transform runs the iterative PIC algorithm to cluster the whole input dataset. + - Input validation: This validates that similarities are non-negative but does NOT validate +that the input matrix is symmetric. + +@see http://en.wikipedia.org/wiki/Spectral_clustering> +Spectral clustering (Wikipedia) + +>>> from pyspark.sql.types import ArrayType, DoubleType, LongType, StructField, StructType +>>> similarities = [((long)(1), [0], [0.5]), ((long)(2), [0, 1], [0.7,0.5]), \ +((long)(3), [0, 1, 2], [0.9, 0.7, 0.5]), \ +((long)(4), [0, 1, 2, 3], [1.1, 0.9, 0.7,0.5]), \ +((long)(5), [0, 1, 2, 3, 4], [1.3, 1.1, 0.9, 0.7,0.5])] +>>> rdd = sc.parallelize(similarities, 2) +>>> schema = StructType([StructField("id", LongType(), False), \ + StructField("neighbors", ArrayType(LongType(), False), True), \ + StructField("similarities", ArrayType(DoubleType(), False), True)]) +>>> df = spark.createDataFrame(rdd, schema) +>>> pic = PowerIterationClustering() +>>> result = pic.setK(2).setMaxIter(10).transform(df) +>>> predictions = sorted(set([(i[0], i[1]) for i in result.select(result.id, result.prediction) +... .collect()]), key=lambda x: x[0]) +>>> predictions[0] +(1, 1) +>>> predictions[1] +(2, 1) +>>> predictions[2] +(3, 0) +>>> predictions[3] +(4, 0) +>>> predictions[4] +(5, 0) +>>> pic_path = temp_path + "/pic" +>>> pic.save(pic_path) +>>> pic2 = PowerIterationClustering.load(pic_path) +>>> pic2.getK() +2 +>>> pic2.getMaxIter() +10 +>>> pic3 = PowerIterationClustering(k=4, initMode="degree") +>>> pic3.getIdCol() +'id' +>>> pic3.getK() +4 +>>> pic3.getMaxIter() +20 +>>> pic3.getInitMode() +'degree' + +.. versionadded:: 2.4.0 +""" + +k = Param(Params._dummy(), "k", + "The number of clusters to create. Must be > 1.", + typeConverter=TypeConverters.toInt) +initMode = Param(Params._dummy(), "initMode", + "The initialization algorithm. This can be either " + + "'random' to use a random vector as vertex properties, or 'degree' to use " + + "a normalized sum of similarities with other vertices. Supported options: " + + "'random' and 'degree'.", + typeConverter=TypeConverters.toString) +idCol = Param(Params._dummy(), "idCol", + "Name of the input column for vertex IDs.", + typeConverter=TypeConverters.toString) +neighborsCol = Param(Params._dummy(), "neighborsCol", + "Name of the input column for neighbors in the adjacency list " + +
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184838934 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams, + JavaMLReadable, JavaMLWritable): +""" +.. note:: Experimental +Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by +http://www.icml2010.org/papers/387.pdf>Lin and Cohen. From the abstract: +PIC finds a very low-dimensional embedding of a dataset using truncated power +iteration on a normalized pair-wise similarity matrix of the data. + +PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix +is a symmetric matrix whose entries are non-negative similarities between items. +PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row +includes: + + - :py:class:`idCol`: vertex ID + - :py:class:`neighborsCol`: neighbors of vertex in :py:class:`idCol` + - :py:class:`similaritiesCol`: non-negative weights (similarities) of edges between the +vertex in :py:class:`idCol` and each neighbor in :py:class:`neighborsCol` + +PIC returns a cluster assignment for each input vertex. It appends a new column +:py:class:`predictionCol` containing the cluster assignment in :py:class:`[0,k)` for +each row (vertex). + +Notes: + + - [[PowerIterationClustering]] is a transformer with an expensive [[transform]] operation. +Transform runs the iterative PIC algorithm to cluster the whole input dataset. + - Input validation: This validates that similarities are non-negative but does NOT validate +that the input matrix is symmetric. + +@see http://en.wikipedia.org/wiki/Spectral_clustering> +Spectral clustering (Wikipedia) + +>>> from pyspark.sql.types import ArrayType, DoubleType, LongType, StructField, StructType +>>> similarities = [((long)(1), [0], [0.5]), ((long)(2), [0, 1], [0.7,0.5]), \ +((long)(3), [0, 1, 2], [0.9, 0.7, 0.5]), \ +((long)(4), [0, 1, 2, 3], [1.1, 0.9, 0.7,0.5]), \ +((long)(5), [0, 1, 2, 3, 4], [1.3, 1.1, 0.9, 0.7,0.5])] +>>> rdd = sc.parallelize(similarities, 2) +>>> schema = StructType([StructField("id", LongType(), False), \ + StructField("neighbors", ArrayType(LongType(), False), True), \ + StructField("similarities", ArrayType(DoubleType(), False), True)]) +>>> df = spark.createDataFrame(rdd, schema) +>>> pic = PowerIterationClustering() +>>> result = pic.setK(2).setMaxIter(10).transform(df) +>>> predictions = sorted(set([(i[0], i[1]) for i in result.select(result.id, result.prediction) +... .collect()]), key=lambda x: x[0]) +>>> predictions[0] +(1, 1) +>>> predictions[1] +(2, 1) +>>> predictions[2] +(3, 0) +>>> predictions[3] +(4, 0) +>>> predictions[4] +(5, 0) +>>> pic_path = temp_path + "/pic" +>>> pic.save(pic_path) +>>> pic2 = PowerIterationClustering.load(pic_path) +>>> pic2.getK() +2 +>>> pic2.getMaxIter() +10 +>>> pic3 = PowerIterationClustering(k=4, initMode="degree") +>>> pic3.getIdCol() +'id' +>>> pic3.getK() +4 +>>> pic3.getMaxIter() +20 +>>> pic3.getInitMode() +'degree' + +.. versionadded:: 2.4.0 +""" + +k = Param(Params._dummy(), "k", + "The number of clusters to create. Must be > 1.", + typeConverter=TypeConverters.toInt) +initMode = Param(Params._dummy(), "initMode", + "The initialization algorithm. This can be either " + + "'random' to use a random vector as vertex properties, or 'degree' to use " + + "a normalized sum of similarities with other vertices. Supported options: " + + "'random' and 'degree'.", + typeConverter=TypeConverters.toString) +idCol = Param(Params._dummy(), "idCol", + "Name of the input column for vertex IDs.", + typeConverter=TypeConverters.toString) +neighborsCol = Param(Params._dummy(), "neighborsCol", + "Name of the input column for neighbors in the adjacency list " + +
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184838848 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -97,13 +97,15 @@ private[clustering] trait PowerIterationClusteringParams extends Params with Has def getNeighborsCol: String = $(neighborsCol) /** - * Param for the name of the input column for neighbors in the adjacency list representation. + * Param for the name of the input column for non-negative weights (similarities) of edges + * between the vertex in `idCol` and each neighbor in `neighborsCol`. --- End diff -- Good catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21185 I think `SparkSession` is driver only, how do we access it in executor? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184838512 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason { * Task was killed intentionally and needs to be rescheduled. */ @DeveloperApi -case class TaskKilled(reason: String) extends TaskFailedReason { +case class TaskKilled( +reason: String, +accumUpdates: Seq[AccumulableInfo] = Seq.empty, +private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil) --- End diff -- Previously we use `AccumulableInfo` to expose accumulator information to end users. Now `AccumulatorV2` is already a public classs and we don't need to do it anymore, I think we can just do ``` case class TaskKilled(reason: String, accums: Seq[AccumulatorV2[_, _]]) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21133#discussion_r184838245 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala --- @@ -279,4 +282,11 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSQLContext { checkAnswer(query, expected) } } + + test("SPARK-24013: unneeded compress can cause performance issues with sorted input") { +failAfter(30 seconds) { --- End diff -- this test looks pretty weird. Can we add some kind of unit test and move this test to PR description and say the perf has improved a lot after this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21122 To add some more color, the current `ExternalCatalog` is an abstract class and can already be implemented outside of Spark. However the problem is the listeners. Everytime we want to listen to one more event, we need to break the API(`createDatabase` and `doCreateDatabase`). this is very bad for a stable interface. The main goal of this PR is to pull out the listener stuff from `ExternalCatalog`, and make `ExternalCatalog` a pure interface. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21109: [SPARK-24020][SQL] Sort-merge join inner range op...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21109#discussion_r184837526 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala --- @@ -222,6 +222,61 @@ class JoinBenchmark extends BenchmarkBase { */ } + val expensiveFunc = (first: Int, second: Int) => { +for (i <- 1 to 2000) { + Math.sqrt(i * i * i) +} +Math.abs(first - second) + } + + def innerRangeTest(N: Int, M: Int): Unit = { +import sparkSession.implicits._ +val expUdf = sparkSession.udf.register("expensiveFunc", expensiveFunc) +val df1 = sparkSession.sparkContext.parallelize(1 to M). + cartesian(sparkSession.sparkContext.parallelize(1 to N)). + toDF("col1a", "col1b") +val df2 = sparkSession.sparkContext.parallelize(1 to M). + cartesian(sparkSession.sparkContext.parallelize(1 to N)). + toDF("col2a", "col2b") +val df = df1.join(df2, 'col1a === 'col2a and ('col1b < 'col2b + 3) and ('col1b > 'col2b - 3)) + assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined) +df.where(expUdf('col1b, 'col2b) < 3).count() + } + + ignore("sort merge inner range join") { +sparkSession.conf.set("spark.sql.join.smj.useInnerRangeOptimization", "false") +val N = 2 << 5 +val M = 100 +runBenchmark("sort merge inner range join", N * M) { + innerRangeTest(N, M) +} + +/* + *AMD EPYC 7401 24-Core Processor + *sort merge join: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + *- + *sort merge join wholestage off13822 / 14068 0.0 2159662.3 1.0X + *sort merge join wholestage on 3863 / 4226 0.0 603547.0 3.6X + */ + } + + ignore("sort merge inner range join optimized") { +sparkSession.conf.set("spark.sql.join.smj.useInnerRangeOptimization", "true") +val N = 2 << 5 +val M = 100 +runBenchmark("sort merge inner range join optimized", N * M) { + innerRangeTest(N, M) +} + +/* + *AMD EPYC 7401 24-Core Processor + *sort merge join: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + *- + *sort merge join wholestage off12723 / 12800 0.0 1988008.4 1.0X --- End diff -- Why wholestage-off case doesn't get much improvement as wholestage-on case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21109 Thanks for working on this. Based on the description on JIRA, I think the main cause of the bad performance is re-calculation an expensive function on matches rows. With the added benchmark, I adjust the order of conditions so the expensive UDF is put at the end of predicate. Below is the results. The first one is original benchmark. The second is the one with UDF at the end of predicate. ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.9.87-linuxkit-aufs Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz sort merge inner range join: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative sort merge inner range join wholestage off 6913 / 6964 0.0 1080112.4 1.0X sort merge inner range join wholestage on 2094 / 2224 0.0 327217.4 3.3X ``` ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.9.87-linuxkit-aufs Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz sort merge inner range join: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative sort merge inner range join wholestage off 675 / 704 0.0 105493.9 1.0X sort merge inner range join wholestage on 374 / 398 0.0 58359.6 1.8X ``` It can be easily improved because short-circuit evaluation of predicate. This can be applied to also other conditions other than just range comparison. So I'm thinking if we need a way to give a hint to Spark to adjust the order of expression for an expensive one like UDF. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 @jiangxb1987 @cloud-fan I think it's ready for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21181 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should contai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21181 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r184837304 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -19,14 +19,41 @@ package org.apache.spark.sql.catalyst.expressions import java.util.Comparator import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData, TypeUtils} import org.apache.spark.sql.types._ import org.apache.spark.unsafe.Platform import org.apache.spark.unsafe.array.ByteArrayMethods import org.apache.spark.unsafe.types.{ByteArray, UTF8String} +/** + * Base trait for [[BinaryExpression]]s with two arrays of the same element type and implicit + * casting. + */ +trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression --- End diff -- As @ueshin pointed out [here](https://github.com/apache/spark/pull/21028#discussion_r184266872), `concat` is also a use case that has a different number of children. Am I wrong? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21187 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21187 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2735/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21187 **[Test build #89950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89950/testReport)** for PR 21187 at commit [`c486c6b`](https://github.com/apache/spark/commit/c486c6b15de49a519c728d037a8979791ea37e74). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot
GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/21187 [SPARK-24035][SQL] SQL syntax for Pivot ## What changes were proposed in this pull request? Add SQL support for Pivot according to Pivot grammar defined by Oracle (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm) with some simplifications, based on our existing functionality and limitations for Pivot at the backend: 1. For pivot_for_clause (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_for_clause.htm), the column list form is not supported, which means the pivot column can only be one single column. 2. For pivot_in_clause (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_in_clause.htm), the sub-query form and "ANY" is not supported (this is only supported by Oracle for XML anyway). 3. For pivot_in_clause, aliases for the constant values are not supported. The code changes are: 1. Add parser support for Pivot. Note that according to https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#i2076542, Pivot cannot be used together with lateral views in the from clause. This restriction has been implemented in the Parser rule. 2. Infer group-by expressions: group-by expressions are not explicitly specified in SQL Pivot clause and need to be deduced based on this rule: https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#CHDFAFIE, so we have to post-fix it at query analysis stage. 3. Override Pivot.resolved as "false": for the reason mentioned in [2] and the fact that output attributes change after Pivot being replaced by Project or Aggregate, we avoid resolving references until after Pivot has been resolved and replaced. 4. Verify aggregate expressions: only aggregate expressions with or without aliases can appear in the first part of the Pivot clause, and this check is performed as analysis stage. ## How was this patch tested? A new test suite PivotSuite is added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark spark-24035 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21187.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21187 commit c486c6b15de49a519c728d037a8979791ea37e74 Author: maryannxueDate: 2018-04-28T01:17:52Z [SPARK-24035] SQL syntax for Pivot --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89945/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21088 **[Test build #89945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89945/testReport)** for PR 21088 at commit [`a7f35f4`](https://github.com/apache/spark/commit/a7f35f4c782d76b78d26688ec9a593d2bbbf3c39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21028 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2734/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21028 **[Test build #89949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89949/testReport)** for PR 21028 at commit [`5925104`](https://github.com/apache/spark/commit/592510461622cd8eccd6f93af2e1fdbc0521fb98). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21028 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21152 @jerryshao Thanks for merging! My Apache JIRA ID is âkabhwanâ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r184813348 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -299,6 +316,9 @@ private[spark] class Executor( Thread.currentThread.setContextClassLoader(replClassLoader) val ser = env.closureSerializer.newInstance() logInfo(s"Running $taskName (TID $taskId)") + // When running in local mode, we might end up with the active session from the driver set on + // this thread, though we never should, so we defensively clear it. See SPARK-23894. + clearActiveSparkSessionMethod.foreach(_.invoke(null)) --- End diff -- Can this be done in the thread pool's thread factory instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r184813243 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,6 +229,23 @@ private[spark] class Executor( ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum } + /** + * Only in local mode, we have to prevent the driver from setting the active SparkSession + * in the executor threads. See SPARK-23894. + */ + lazy val clearActiveSparkSessionMethod = if (Utils.isLocalMaster(conf)) { --- End diff -- private? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21152: [SPARK-23688][SS] Refactor tests away from rate s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21152 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21152 @HeartSaVioR what is your JIRA id? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21152 LGTM. Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r184835757 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -16,12 +16,14 @@ */ package org.apache.spark.sql.catalyst.expressions +import java.util import java.util.Comparator import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, MapData, TypeUtils} +import org.apache.spark.sql.catalyst.util.ArrayBasedMapData --- End diff -- How about merging these two lines into one line `org.apache.spark.sql.catalyst.util._`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21178 **[Test build #89947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89947/testReport)** for PR 21178 at commit [`77142c6`](https://github.com/apache/spark/commit/77142c6caf2bcc46defc19994613af76d872673b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89947/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2733/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21166: [SPARK-11334][CORE] clear idle executors in executorIdTo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21166 1. We improve the DAGScheduler to always send TaskEnd message. So the issue I found before may not be valid. 2. We refactored the LiveListenerQueue to make it more robust for internal listener. We cannot guarantee that event will never be lost, but the chance is quite small (SPARK-18838). IMHO you (as a PR submitter) should validate this issue with latest code and make sure you can reproduce it with latest code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21021 **[Test build #89948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89948/testReport)** for PR 21021 at commit [`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21021 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2732/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21178 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21178 **[Test build #89947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89947/testReport)** for PR 21178 at commit [`77142c6`](https://github.com/apache/spark/commit/77142c6caf2bcc46defc19994613af76d872673b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2731/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89946/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21119 **[Test build #89946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89946/testReport)** for PR 21119 at commit [`a6b1822`](https://github.com/apache/spark/commit/a6b18222b65e878e22ddf8f2d340aa3127c99e0c). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21119 **[Test build #89946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89946/testReport)** for PR 21119 at commit [`a6b1822`](https://github.com/apache/spark/commit/a6b18222b65e878e22ddf8f2d340aa3127c99e0c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89944/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89944/testReport)** for PR 21073 at commit [`2e49b1e`](https://github.com/apache/spark/commit/2e49b1e01ba10d7baba9196d64af8db1cd7b2dd1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21178#discussion_r184833705 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java --- @@ -362,4 +371,34 @@ public static void verifyProxyAccess(String realUser, String proxyUser, String i } } + public static boolean needUgiLogin(UserGroupInformation ugi, String principal, String keytab) { +return null == ugi || !ugi.hasKerberosCredentials() || !ugi.getUserName().equals(principal) || + !keytab.equals(getKeytabFromUgi()); + } + + private static String getKeytabFromUgi() { +Class clz = UserGroupInformation.class; +try { + synchronized (clz) { +Field field = clz.getDeclaredField("keytabFile"); +field.setAccessible(true); +return (String) field.get(null); + } +} catch (NoSuchFieldException e) { + try { +synchronized (clz) { + // In Hadoop 3 we don't have "keytabFile" field, instead we should use private method + // getKeytab(). + Method method = clz.getDeclaredMethod("getKeytab"); + method.setAccessible(true); + return (String) method.invoke(UserGroupInformation.getCurrentUser()); --- End diff -- This will only be called twice in the initialization stage, so there should not be large overhead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21178#discussion_r184833443 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala --- @@ -52,8 +52,22 @@ private[hive] class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLC if (UserGroupInformation.isSecurityEnabled) { try { -HiveAuthFactory.loginFromKeytab(hiveConf) -sparkServiceUGI = Utils.getUGI() +val principal = hiveConf.getVar(ConfVars.HIVE_SERVER2_KERBEROS_PRINCIPAL) +val keyTabFile = hiveConf.getVar(ConfVars.HIVE_SERVER2_KERBEROS_KEYTAB) +if (principal.isEmpty || keyTabFile.isEmpty) { + throw new IOException( +"HiveServer2 Kerberos principal or keytab is not correctly configured") +} + +val originalUgi = UserGroupInformation.getCurrentUser --- End diff -- I don't think there's any particular reason, we just copy what HS2 did before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21178#discussion_r184833381 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java --- @@ -18,14 +18,11 @@ package org.apache.hive.service.auth; import java.io.IOException; +import java.lang.reflect.Field; +import java.lang.reflect.Method; import java.net.InetSocketAddress; import java.net.UnknownHostException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.HashMap; -import java.util.List; -import java.util.Locale; -import java.util.Map; +import java.util.*; --- End diff -- This is automatically done by my intellij idea, will revert back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21145 I think `ReadTask` is fine. That name does not imply that you can use the object itself to read, but it does correctly show that it is one task in a larger operation. I think the name implies that it represents something to be read, which is correct, and it is reasonable to look at the API for that object to see how to read it. That can be clearly accomplished, so I don't think we need a different name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21185 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89939/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21185 **[Test build #89939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89939/testReport)** for PR 21185 at commit [`2a4944f`](https://github.com/apache/spark/commit/2a4944ffe5836408b80f9aa06e9b28e57aa16649). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21182 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89942/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21182 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21182 **[Test build #89942 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89942/testReport)** for PR 21182 at commit [`8a8ff3f`](https://github.com/apache/spark/commit/8a8ff3f5bfdfaee7ec73e362cfa34261d199f407). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89940/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21173 oh, I'll update. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #89940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89940/testReport)** for PR 20894 at commit [`1fffc16`](https://github.com/apache/spark/commit/1fffc1614c5028fcbaf88bb07b9e75d56646aec1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes ` * `case class ArrayJoin(` * `case class ArrayPosition(left: Expression, right: Expression)` * `case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil ` * `case class Concat(children: Seq[Expression]) extends Expression ` * `case class Flatten(child: Expression) extends UnaryExpression ` * `abstract class GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes ` * `case class GetMapValue(child: Expression, key: Expression)` * `case class MonthsBetween(` * `trait QueryPlanConstraints extends ConstraintHelper ` * `trait ConstraintHelper ` * `case class CachedRDDBuilder(` * `case class InMemoryRelation(` * `case class WriteToContinuousDataSource(` * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89941/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21186 **[Test build #89941 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89941/testReport)** for PR 21186 at commit [`5383299`](https://github.com/apache/spark/commit/5383299738877b76c46d603635520e77dad52fd9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21174 @gatorsmile @maropu Thank you very much !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89938/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89938/testReport)** for PR 20937 at commit [`e0cebf4`](https://github.com/apache/spark/commit/e0cebf4aa8bdec4d27ad9cd8d4296ebbb8ed9269). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HasCollectSubModels(Params):` * `class Summarizer(object):` * `class SummaryBuilder(JavaWrapper):` * `class CrossValidator(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels,` * `class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels,` * `case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes ` * `case class ArrayJoin(` * `case class ArrayMin(child: Expression) extends UnaryExpression with ImplicitCastInputTypes ` * `case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCastInputTypes ` * `case class ArrayPosition(left: Expression, right: Expression)` * `case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil ` * `case class Concat(children: Seq[Expression]) extends Expression ` * `case class Flatten(child: Expression) extends UnaryExpression ` * `abstract class GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes ` * `case class GetMapValue(child: Expression, key: Expression)` * `case class MonthsBetween(` * `trait QueryPlanConstraints extends ConstraintHelper ` * `trait ConstraintHelper ` * `class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) extends IndexedSeq[T] ` * ` .doc(\"The class used to write checkpoint files atomically. This class must be a subclass \" +` * `case class CachedRDDBuilder(` * `case class InMemoryRelation(` * `trait CheckpointFileManager ` * ` sealed trait RenameHelperMethods ` * ` abstract class CancellableFSDataOutputStream(protected val underlyingStream: OutputStream)` * ` sealed class RenameBasedFSDataOutputStream(` * `class FileSystemBasedCheckpointFileManager(path: Path, hadoopConf: Configuration)` * `class FileContextBasedCheckpointFileManager(path: Path, hadoopConf: Configuration)` * `case class WriteToContinuousDataSource(` * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan)` * `abstract class MemoryStreamBase[A : Encoder](sqlContext: SQLContext) extends BaseStreamingSource ` * `class ContinuousMemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)` * ` case class GetRecord(offset: ContinuousMemoryStreamPartitionOffset)` * `class ContinuousMemoryStreamDataReaderFactory(` * `class ContinuousMemoryStreamDataReader(` * `case class ContinuousMemoryStreamOffset(partitionNums: Map[Int, Int])` * `case class ContinuousMemoryStreamPartitionOffset(partition: Int, numProcessed: Int)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21088 **[Test build #89945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89945/testReport)** for PR 21088 at commit [`a7f35f4`](https://github.com/apache/spark/commit/a7f35f4c782d76b78d26688ec9a593d2bbbf3c39). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21170: [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala toString...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21170 cc @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the same ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21184 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89937/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the same ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21184 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the same ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21184 **[Test build #89937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89937/testReport)** for PR 21184 at commit [`d676b62`](https://github.com/apache/spark/commit/d676b6277a682894d409e314e64ece7857a97841). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21088: [SPARK-24003][CORE] Add support to provide spark....
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21088#discussion_r184814790 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -914,7 +916,9 @@ private[spark] class Client( s"(was '$opts'). Use spark.yarn.am.memory instead." throw new SparkException(msg) } -javaOpts ++= Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell) +javaOpts ++= Utils.splitCommandString(opts) +.map(Utils.substituteAppId(_, appId.toString)) --- End diff -- nit: indentation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21041: [SPARK-23962][SQL][TEST] Fix race in currentExecutionIds...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21041 Thank you, @squito ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89944/testReport)** for PR 21073 at commit [`2e49b1e`](https://github.com/apache/spark/commit/2e49b1e01ba10d7baba9196d64af8db1cd7b2dd1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21119 **[Test build #89943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89943/testReport)** for PR 21119 at commit [`6d00f34`](https://github.com/apache/spark/commit/6d00f343f5c78fbe290793fe85cbc3deed53cf3e). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89943/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2730/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21119 **[Test build #89943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89943/testReport)** for PR 21119 at commit [`6d00f34`](https://github.com/apache/spark/commit/6d00f343f5c78fbe290793fe85cbc3deed53cf3e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21182 **[Test build #89942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89942/testReport)** for PR 21182 at commit [`8a8ff3f`](https://github.com/apache/spark/commit/8a8ff3f5bfdfaee7ec73e362cfa34261d199f407). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org