[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2405/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #89474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89474/testReport)** for PR 21061 at commit [`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21061 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21078 Thanks for thinking through the optional logging issue! I responded in the JIRA to preserve the design discussion there: https://issues.apache.org/jira/browse/SPARK-23990 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Github user aviv-ebates closed the pull request at: https://github.com/apache/spark/pull/21057 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21057 Right, let me try to cherry-pick and see if I can write a test. Will try to have some time and open a PR after cherry-picking your commit. I think you can close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming] Update query status ...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21063 The approach looks good to me, but we probably want to add some tests to StreamingQueryStatusAndProgressSuite. (See test("basic") in ContinuousSuite for how to set up a continuous processing memory stream, and note that processAllAvailable() won't work properly for continuous execution - you'll want to use CheckAnswer to await the added data and Execute to do the test-specific progress checks) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler ...
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/20641#discussion_r182271196 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala --- @@ -199,6 +199,38 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi }) } + test("properly wraps and escapes parameters passed to driver command") { --- End diff -- Sorry for the delay. I was going to test this in DC/OS and haven't gotten a chance to do so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182269723 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val predictUDF = udf((vector: Vector) => predict(vector)) +// val predictUDF = udf((vector: Vector) => predict(vector)) +val predictUDF = if (dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) { + udf((vector: Vector) => predict(vector)) +} +else { + udf((vector: Seq[_]) => { +val featureArray = Array.fill[Double](vector.size)(0.0) --- End diff -- Here's what I meant: ``` val predictUDF = featuresDataType match { case _: VectorUDT => udf((vector: Vector) => predict(vector)) case fdt: ArrayType => fdt.elementType match { case _: FloatType => ??? case _: DoubleType => ??? } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182269644 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val predictUDF = udf((vector: Vector) => predict(vector)) +// val predictUDF = udf((vector: Vector) => predict(vector)) +val predictUDF = if (dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) { + udf((vector: Vector) => predict(vector)) --- End diff -- Side note: I realized that "predict" will cause the whole model to be serialized and sent to workers. But that's actually OK since we do need to send most of the model data to make predictions and since there's not a clean way to just sent the model weights. So I think my previous comment about copying "numClasses" to a local variable was not necessary. Don't bother reverting the change though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21029: [SPARK-23952] remove type parameter in DataReader...
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21029#discussion_r182267906 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -95,21 +77,29 @@ case class DataSourceV2ScanExec( sparkContext.getLocalProperty(ContinuousExecution.EPOCH_COORDINATOR_ID_KEY), sparkContext.env) .askSync[Unit](SetReaderPartitions(readerFactories.size)) - new ContinuousDataSourceRDD(sparkContext, sqlContext, readerFactories) -.asInstanceOf[RDD[InternalRow]] - -case r: SupportsScanColumnarBatch if r.enableBatchRead() => - new DataSourceRDD(sparkContext, batchReaderFactories).asInstanceOf[RDD[InternalRow]] - + if (readerFactories.exists(_.dataFormat() == DataFormat.COLUMNAR_BATCH)) { +throw new IllegalArgumentException( + "continuous stream reader does not support columnar read yet.") --- End diff -- I've thought about this further. Shouldn't it be trivial to write a wrapper that simply converts a DataReader[ColumnarBatch] to a DataReader[InternalRow]? If so then we can easily support it after the current PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Github user henryr commented on the issue: https://github.com/apache/spark/pull/21070 @scottcarey I agree that's important. Perhaps it could be done as a follow-up PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21089 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21089 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89471/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21089 **[Test build #89471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89471/testReport)** for PR 21089 at commit [`1458077`](https://github.com/apache/spark/commit/1458077cf6817701d74fcebd2e83ab6a62889fd8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/21090#discussion_r182254888 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.clustering + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.Transformer +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.clustering.{PowerIterationClustering => MLlibPowerIterationClustering} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, Row} +import org.apache.spark.sql.functions.col +import org.apache.spark.sql.types._ + +/** + * Common params for PowerIterationClustering + */ +private[clustering] trait PowerIterationClusteringParams extends Params with HasMaxIter + with HasPredictionCol { + + /** + * The number of clusters to create (k). Must be 1. Default: 2. + * @group param + */ + @Since("2.4.0") + final val k = new IntParam(this, "k", "The number of clusters to create. " + +"Must be > 1.", ParamValidators.gt(1)) + + /** @group getParam */ + @Since("2.4.0") + def getK: Int = $(k) + + /** + * Param for the initialization algorithm. This can be either "random" to use a random vector + * as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. + * Default: random. + * @group expertParam + */ + @Since("2.4.0") + final val initMode = { +val allowedParams = ParamValidators.inArray(Array("random", "degree")) +new Param[String](this, "initMode", "The initialization algorithm. This can be either " + + "'random' to use a random vector as vertex properties, or 'degree' to use a normalized sum " + + "of similarities with other vertices. Supported options: 'random' and 'degree'.", + allowedParams) + } + + /** @group expertGetParam */ + @Since("2.4.0") + def getInitMode: String = $(initMode) + + /** + * Param for the name of the input column for vertex IDs. + * Default: "id" + * @group param + */ + @Since("2.4.0") + val idCol = new Param[String](this, "idCol", "Name of the input column for vertex IDs.", +(value: String) => value.nonEmpty) + + setDefault(idCol, "id") + + /** @group getParam */ + @Since("2.4.0") + def getIdCol: String = getOrDefault(idCol) + + /** + * Param for the name of the input column for neighbors in the adjacency list representation. + * Default: "neighbors" + * @group param + */ + @Since("2.4.0") + val neighborsCol = new Param[String](this, "neighborsCol", +"Name of the input column for neighbors in the adjacency list representation.", +(value: String) => value.nonEmpty) + + setDefault(neighborsCol, "neighbors") + + /** @group getParam */ + @Since("2.4.0") + def getNeighborsCol: String = $(neighborsCol) + + /** + * Param for the name of the input column for neighbors in the adjacency list representation. + * Default: "similarities" + * @group param + */ + @Since("2.4.0") + val similaritiesCol = new Param[String](this, "similaritiesCol", +"Name of the input column for neighbors in the adjacency list representation.", +(value: String) => value.nonEmpty) + + setDefault(similaritiesCol, "similarities") + + /** @group getParam */ + @Since("2.4.0") + def getSimilaritiesCol: String = $(similaritiesCol) + + protected def validateAndTransformSchema(schema: StructType): StructType = { +SchemaUtils.checkColumnTypes(schema, $(idCol), Seq(IntegerType, LongType)) +SchemaUtils.checkColumnTypes(schema,
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/21090 Take a quick look. Despite of the style failure and a minor format issue, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/21090#discussion_r182243819 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala --- @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.clustering + +import scala.collection.mutable + +import org.apache.spark.ml.util.DefaultReadWriteTest +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.functions.col +import org.apache.spark.sql.types._ +import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession} +import org.apache.spark.{SparkException, SparkFunSuite} + + +class PowerIterationClusteringSuite extends SparkFunSuite + with MLlibTestSparkContext with DefaultReadWriteTest { + + @transient var data: Dataset[_] = _ + final val r1 = 1.0 + final val n1 = 10 + final val r2 = 4.0 + final val n2 = 40 + + override def beforeAll(): Unit = { +super.beforeAll() + +data = PowerIterationClusteringSuite.generatePICData(spark, r1, r2, n1, n2) + } + + test("default parameters") { +val pic = new PowerIterationClustering() + +assert(pic.getK === 2) +assert(pic.getMaxIter === 20) +assert(pic.getInitMode === "random") +assert(pic.getPredictionCol === "prediction") +assert(pic.getIdCol === "id") +assert(pic.getNeighborsCol === "neighbors") +assert(pic.getSimilaritiesCol === "similarities") + } + + test("parameter validation") { +intercept[IllegalArgumentException] { + new PowerIterationClustering().setK(1) +} +intercept[IllegalArgumentException] { + new PowerIterationClustering().setInitMode("no_such_a_mode") +} +intercept[IllegalArgumentException] { + new PowerIterationClustering().setIdCol("") +} +intercept[IllegalArgumentException] { + new PowerIterationClustering().setNeighborsCol("") +} +intercept[IllegalArgumentException] { + new PowerIterationClustering().setSimilaritiesCol("") +} + } + + test("power iteration clustering") { +val n = n1 + n2 + +val model = new PowerIterationClustering() + .setK(2) + .setMaxIter(40) +val result = model.transform(data) + +val predictions = Array.fill(2)(mutable.Set.empty[Long]) +result.select("id", "prediction").collect().foreach { + case Row(id: Long, cluster: Integer) => predictions(cluster) += id +} +assert(predictions.toSet == Set((1 until n1).toSet, (n1 until n).toSet)) + +val result2 = new PowerIterationClustering() + .setK(2) + .setMaxIter(10) + .setInitMode("degree") + .transform(data) +val predictions2 = Array.fill(2)(mutable.Set.empty[Long]) +result2.select("id", "prediction").collect().foreach { + case Row(id: Long, cluster: Integer) => predictions2(cluster) += id +} +assert(predictions2.toSet == Set((1 until n1).toSet, (n1 until n).toSet)) + } + + test("supported input types") { +val model = new PowerIterationClustering() + .setK(2) + .setMaxIter(1) + +def runTest(idType: DataType, neighborType: DataType, similarityType: DataType): Unit = { + val typedData = data.select( +col("id").cast(idType).alias("id"), +col("neighbors").cast(ArrayType(neighborType, containsNull = false)).alias("neighbors"), +col("similarities").cast(ArrayType(similarityType, containsNull = false)) + .alias("similarities") + ) + model.transform(typedData).collect() +} + +for (idType <- Seq(IntegerType, LongType)) { + runTest(idType, LongType, DoubleType) +} +for (neighborType <- Seq(IntegerType, LongType)) { + runTest(LongType,
[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89467/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19881 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19881 **[Test build #89467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89467/testReport)** for PR 19881 at commit [`15732ab`](https://github.com/apache/spark/commit/15732ab7ee22a9cc4409b36812b5b2c930854723). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89470/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #89470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89470/testReport)** for PR 21061 at commit [`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes ` * `case class ArrayUnion(left: Expression, right: Expression) extends ArraySetUtils ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89473/testReport)** for PR 21073 at commit [`44137cc`](https://github.com/apache/spark/commit/44137cc9a9949b4218d973dc46d905d3ce301bcd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89468/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #89468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89468/testReport)** for PR 21061 at commit [`809621b`](https://github.com/apache/spark/commit/809621b9b73b67ebd8fb5ffcf1956fc6dc98be43). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89466/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #89466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89466/testReport)** for PR 21061 at commit [`26c30b9`](https://github.com/apache/spark/commit/26c30b954ad65b5bb41633afb62a8953b5a6a31a). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21085: [SPARK-23948] Trigger mapstage's job listener in submitM...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21085 known flaky test https://issues.apache.org/jira/browse/SPARK-23894 merging to branch 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182216309 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params with HasMaxIter with HasFe * @return output schema */ protected def validateAndTransformSchema(schema: StructType): StructType = { -SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT) +val typeCandidates = List( new VectorUDT, + new ArrayType(DoubleType, true), --- End diff -- Thinking about this, let's actually disallow nullable columns. KMeans won't handle nulls properly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182216415 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params with HasMaxIter with HasFe * @return output schema */ protected def validateAndTransformSchema(schema: StructType): StructType = { -SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT) +val typeCandidates = List( new VectorUDT, + new ArrayType(DoubleType, true), --- End diff -- Also, IntelliJ may warn you about passing boolean arguments as named arguments; that'd be nice to fix here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182215434 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val predictUDF = udf((vector: Vector) => predict(vector)) +// val predictUDF = udf((vector: Vector) => predict(vector)) +val predictUDF = if (dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) { + udf((vector: Vector) => predict(vector)) +} --- End diff -- Scala style: } else { --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182217722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val predictUDF = udf((vector: Vector) => predict(vector)) +// val predictUDF = udf((vector: Vector) => predict(vector)) +val predictUDF = if (dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) { + udf((vector: Vector) => predict(vector)) +} +else { + udf((vector: Seq[_]) => { +val featureArray = Array.fill[Double](vector.size)(0.0) --- End diff -- You shouldn't have to do the conversion in this convoluted (and less efficient) way. I'd recommend doing a match-case statement on dataset.schema; I think that will be the most succinct. Then you can handle Vector, Seq of Float, and Seq of Double separately, without conversions to strings. Same for the similar cases below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182215639 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) -val predictUDF = udf((vector: Vector) => predict(vector)) +// val predictUDF = udf((vector: Vector) => predict(vector)) +val predictUDF = if (dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) { + udf((vector: Vector) => predict(vector)) +} +else { + udf((vector: Seq[_]) => { --- End diff -- scala style: remove unnecessary ```{``` at end of line (IntelliJ should warn you about this) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/21068 Yes we can create an abstract class from `YarnAllocatorBlacklistTracker` (like `AbstractAllocatorBlacklistTracker`) where the method `synchronizeBlacklistedNodes` can have different implementations. In this case the core and the messages can stay as it is. As I see this is the less risky and cheaper solution. On the other hand having the complete blacklisting in the driver has a more centralized/clear design. We just have to make our mind where to go from here. Any help and suggestions are welcomed for the decision. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 OK sorry to push @wangmiao1981 ! I just want to make sure this gets in before I no longer have bandwidth for it. If you have the time, would you mind checking the updates I made in the new PR? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21090 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2404/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89472/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21090 **[Test build #89472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89472/testReport)** for PR 21090 at commit [`d215748`](https://github.com/apache/spark/commit/d2157489770a79fe443d567bfc03d61f72fbe161). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21090 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21090 @wangmiao1981 and @WeichenXu123 would you mind taking a look? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21090 **[Test build #89472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89472/testReport)** for PR 21090 at commit [`d215748`](https://github.com/apache/spark/commit/d2157489770a79fe443d567bfc03d61f72fbe161). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21090 **To review this PR**: This was copied from https://github.com/apache/spark/pull/15770 with the following changes: * Addressed comments in original PR (See my review comments there) * Added Param validators for required input columns * Renamed âweightsâ column to âsimilaritiesâ * Made algorithm take more types of inputs: Long/Int and Double/Float * Removed test("set parameters") since setters are already tested in the read/write test. If you saw the previous PR, you should be able to review this one based on the last 3 commits, viewable in this diff: https://github.com/jkbradley/spark/compare/5cb8ed6...wangmiao1981-pic --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21090 [SPARK-15784][ML] Add Power Iteration Clustering to spark.ml ## What changes were proposed in this pull request? This PR adds PowerIterationClustering as a Transformer to spark.ml. In the transform method, it calls spark.mllib's PowerIterationClustering.run() method and transforms the return value assignments (the Kmeans output of the pseudo-eigenvector) as a DataFrame (id: LongType, cluster: IntegerType). This PR is copied and modified from https://github.com/apache/spark/pull/15770 The primary author is @wangmiao1981 ## How was this patch tested? This PR has 2 types of tests: * Copies of tests from spark.mllib's PIC tests * New tests specific to the spark.ml APIs You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark wangmiao1981-pic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21090.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21090 commit e4492a64b74b0ccc2da8f13353d37bb9bb0c Author: wm...@hotmail.comDate: 2016-06-13T19:47:42Z add pic framework (model, class etc) commit 70862491e5b86ce4add500a0c96ae5220733b35d Author: wm...@hotmail.com Date: 2016-06-13T23:28:09Z change a comment commit b73d8a78fa69f83c278996feb1b19502ef871c5b Author: wm...@hotmail.com Date: 2016-06-17T17:27:55Z add missing functions fit predict load save etc. commit 022fe523f735c5519f948b175871489f79434fb5 Author: wm...@hotmail.com Date: 2016-06-18T01:12:41Z add unit test flie commit 552cf54fb03f88af023f080e60fa50f1f39060fc Author: wm...@hotmail.com Date: 2016-06-20T17:35:05Z add test cases part 1 commit 0b4954d55b4d344794d3c47366220c67f07d0d43 Author: wm...@hotmail.com Date: 2016-06-20T20:29:54Z add unit test part 2: test fit, parameters etc. commit f22b01e06eaaf5951befcebdffc18c8e519183d2 Author: wm...@hotmail.com Date: 2016-06-20T21:22:59Z fix a type issue commit 305b194dae40eaff990c18837c3f2bc8d469e60c Author: wm...@hotmail.com Date: 2016-06-21T20:07:27Z add more unit tests commit 4b32cbf02965c5c1a0c094fa144836dab0dfd543 Author: wm...@hotmail.com Date: 2016-06-21T21:46:25Z delete unused import and add comments commit f6eda88a6c0af416b988a2c37f46c8b08e5e99cf Author: wm...@hotmail.com Date: 2016-10-25T21:28:12Z change version to 2.1.0 commit 45c4b1cd1afa28c775c666b57ecee614ed9a41d0 Author: wm...@hotmail.com Date: 2016-11-03T23:26:01Z change PIC as a Transformer commit e8d7ed37138909d010a812fba7d03ef30a4f6e05 Author: wm...@hotmail.com Date: 2016-11-04T17:28:26Z add LabelCol commit e4e1e055a9b3ab54b83331ac7dc56d6b792dcf7b Author: wm...@hotmail.com Date: 2016-11-04T18:36:09Z change col implementation commit 8384422ec0e7192cc8ce53df02ddb4ae0401fd0b Author: wm...@hotmail.com Date: 2017-02-17T22:20:00Z address some of the comments commit d6a199c48ff940861d80caf275da29d99375ce33 Author: wm...@hotmail.com Date: 2017-02-21T22:37:51Z add additional test with dataset having more data commit b0c3aff4a76ace99c104c2b2c10c9485a028bfd6 Author: wm...@hotmail.com Date: 2017-03-14T23:13:45Z change input data format commit 091225dd2f1c353edc28dc4299034a018a92bc81 Author: wm...@hotmail.com Date: 2017-03-15T22:49:45Z resolve warnings commit 8bb99567556ce29c75d5f395157d0161dff695bc Author: wm...@hotmail.com Date: 2017-03-16T18:33:47Z add neighbor and weight cols commit 8ba82e8392e6d607ab750ed8eb3caaf386e1352a Author: wangmiao1981 Date: 2017-08-15T21:13:55Z address review comments 1 commit 468a94741efe6530c9acfbb1af4f46499550ee1f Author: wangmiao1981 Date: 2017-08-15T21:23:39Z fix style commit ec10f24336ff51354a1657c7ceadb9ada8cd1484 Author: wangmiao1981 Date: 2017-08-15T22:30:28Z remove unused comments commit 5710cfcf2e3596c95f353ce043f7358a030d70a0 Author: wangmiao1981 Date: 2017-08-15T23:43:14Z add Since commit 88654b3055ebd863e3b3c5774abdce28f3cda184 Author: wangmiao1981 Date: 2017-08-17T00:12:12Z fix missing > commit 804adc6fece91e7264f315ee965faa40c5e334c5 Author: wangmiao1981 Date: 2017-08-17T17:26:40Z fix doc commit 4a6dd79a9c37f71ea4378692438f19b3247b7913 Author: wangmiao1981 Date: 2017-10-25T23:16:55Z address review comments commit 5cb8ed6de3865f58719b3b30888b3bc4542905d4 Author: wangmiao1981 Date: 2017-10-30T21:44:24Z fix unit test commit
[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/21083 Thank you for you reply, @cloud-fan! I was not clear when you had become aware of the effort on SPARK-21479 so it might be a misunderstanding on my side and I apologize. Anyway, if you had had a closer look at the PR, you would have probably got the idea that it's basically the same approach as what you have here, only that you have covered more join types. Here's another note. There's two types of constraint-to-filter inference for joins going on here: 1. Infer from the Join node constraints, which is covered by the `PushPredicateThroughJoin` rule; 2. Infer from the sibling child node combined with the join condition, which is what you've added here. That said, the InnerLike joins should already be covered by 1 and might not be worth being considered again in this optimization rule. Not sure about LeftSemi joins, so it would be nice if we could have a test case that proves this optimization does something that has not yet been covered by `PushPredicateThroughJoin` rule. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2403/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21089 **[Test build #89471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89471/testReport)** for PR 21089 at commit [`1458077`](https://github.com/apache/spark/commit/1458077cf6817701d74fcebd2e83ab6a62889fd8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r182200622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -417,3 +418,156 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + +abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes { + val kindUnion = 1 + def typeId: Int + + def array1: Expression + def array2: Expression + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType) + + override def checkInputDataTypes(): TypeCheckResult = { +val r = super.checkInputDataTypes() +if ((r == TypeCheckResult.TypeCheckSuccess) && + (array1.dataType.asInstanceOf[ArrayType].elementType != +array2.dataType.asInstanceOf[ArrayType].elementType)) { + TypeCheckResult.TypeCheckFailure("Element type in both arrays must be the same") +} else { + r +} + } + + override def dataType: DataType = array1.dataType + + private def elementType = dataType.asInstanceOf[ArrayType].elementType + private def cn1 = array1.dataType.asInstanceOf[ArrayType].containsNull + private def cn2 = array2.dataType.asInstanceOf[ArrayType].containsNull + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val ary1 = input1.asInstanceOf[ArrayData] +val ary2 = input2.asInstanceOf[ArrayData] + +if (!cn1 && !cn2) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + val hs = new OpenHashSet[Int] + var i = 0 + while (i < ary1.numElements()) { +hs.add(ary1.getInt(i)) +i += 1 + } + i = 0 + while (i < ary2.numElements()) { --- End diff -- We can also support `array_union` and `array_except` by changing this 2nd loop with small other changes. This is why we introduced `ArraySetUtils` in this PR. Other PRs will update `ArraySetUtils` appropriately. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #89470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89470/testReport)** for PR 21061 at commit [`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21089 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89469/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21089 **[Test build #89469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89469/testReport)** for PR 21089 at commit [`4391b0a`](https://github.com/apache/spark/commit/4391b0a94e7ddbf53043e6723b7e6e63655f5759). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21089 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21089 **[Test build #89469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89469/testReport)** for PR 21089 at commit [`4391b0a`](https://github.com/apache/spark/commit/4391b0a94e7ddbf53043e6723b7e6e63655f5759). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21089 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21089: [SPARK-24004] Test of from_json for non-root MapT...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21089 [SPARK-24004] Test of from_json for non-root MapType ## What changes were proposed in this pull request? New test checks that from_json is able to parse json contains MapType as a value type of struct fields. The test required adding of the equals and hashCode methods to ArrayBasedMapData to compare returned result to expected value. ## How was this patch tested? Added comparison tests for ArrayBasedMapData and a test for from_json. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 from_json-map-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21089.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21089 commit d91d93936ce1c12765dffdbf2fe773585dd39eae Author: Maxim GekkDate: 2018-04-17T18:15:44Z Added a test for checking from_json: json -> struct of map commit 428c6ba93202bf3235667794f9f8055c432733e1 Author: Maxim Gekk Date: 2018-04-17T18:16:54Z Added a test for comparison of ArrayBasedMapData commit 58d1f7e41883b5e8924eca83bcf72deedadaf0ef Author: Maxim Gekk Date: 2018-04-17T19:01:00Z Implemented the equals and hashCode methods of ArrayBasedMapData commit 4391b0a94e7ddbf53043e6723b7e6e63655f5759 Author: Maxim Gekk Date: 2018-04-17T19:21:49Z Improving test title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89464/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21086 **[Test build #89464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89464/testReport)** for PR 21086 at commit [`3bb4824`](https://github.com/apache/spark/commit/3bb4824225b53f0ee7900835bfc99b9bd01f7d4f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89460/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20998 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20998 **[Test build #89460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89460/testReport)** for PR 20998 at commit [`0c6f305`](https://github.com/apache/spark/commit/0c6f3058a5c0af4a6e9cd1a90d43230805305df5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/15770 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley I close this one now. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley Sorry for missing your comments. Anyway, I will close it now. I will choose another one to work on. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21088 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21088 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21088: [SPARK-24003][CORE] Add support to provide spark....
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/21088 [SPARK-24003][CORE] Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's ## What changes were proposed in this pull request? Added support to specify the 'spark.executor.extraJavaOptions' value in terms of the `{{APP_ID}}` and/or `{{EXECUTOR_ID}}`, `{{APP_ID}}` will be replaced by Application Id and `{{EXECUTOR_ID}}` will be replaced by Executor Id while starting the executor. ## How was this patch tested? I have verified this by checking the executor process command and gc logs. I verified the same in different deployment modes(Standalone, YARN, Mesos) client and cluster modes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-24003 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21088.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21088 commit ff6e79315c0ea26b45207753d3e79f96f2395329 Author: Devaraj KDate: 2018-04-17T18:37:11Z [SPARK-24003][CORE] Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89462/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21082 **[Test build #89462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89462/testReport)** for PR 21082 at commit [`dfdb03f`](https://github.com/apache/spark/commit/dfdb03fc9c19b120a046b45b0864f8f405b0ead5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89465/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #89465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89465/testReport)** for PR 20787 at commit [`913eed8`](https://github.com/apache/spark/commit/913eed849d218a34d6e28a2500301320ba21cecc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/21071 @gatorsmile we need to have this for K8S as well, will include it in SPIP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89459/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21074 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21074 **[Test build #89459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89459/testReport)** for PR 21074 at commit [`571912f`](https://github.com/apache/spark/commit/571912f3ed21cd3753fa76225f88d0f6d8298989). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2402/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Github user scottcarey commented on the issue: https://github.com/apache/spark/pull/21070 This PR should include changes to `ParquetOptions.scala` to expose the `LZ4`, `ZSTD` and `BROTLI` parquet compression codecs, or else spark users won't be able to leverage those parquet 1.10.0 features. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21071 @devaraj-kavali How about K8S? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #89468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89468/testReport)** for PR 21061 at commit [`809621b`](https://github.com/apache/spark/commit/809621b9b73b67ebd8fb5ffcf1956fc6dc98be43). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21071 cc @jiangxb1987 @JoshRosen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/21071 Thanks @rxin and @markhamstra for your comments, I will come up with SPIP design draft and start the discussion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming] Update query status ...
Github user efimpoberezkin commented on the issue: https://github.com/apache/spark/pull/21063 Hi @jose-torres, I made some changes to this PR according to your comment, could you review it please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89456/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21029 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89461/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21029 **[Test build #89461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89461/testReport)** for PR 21029 at commit [`1ae4b6d`](https://github.com/apache/spark/commit/1ae4b6dd8d07fa3b095dbbc6c435c337f71bd0bd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20858 **[Test build #89456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89456/testReport)** for PR 20858 at commit [`f2a67e8`](https://github.com/apache/spark/commit/f2a67e82880896bf7c09d3067f7d1699c43d2505). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15113: [SPARK-17508][PYSPARK][ML] PySpark treat Param values No...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15113 I still think this makes sense, but maybe I'm the minority. I'll go ahead and close it unless anyone else thinks it should be changed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15113: [SPARK-17508][PYSPARK][ML] PySpark treat Param va...
Github user BryanCutler closed the pull request at: https://github.com/apache/spark/pull/15113 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19881 Thanks @jcuquemelle --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org