date:20180417

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2405/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89474/testReport)**
 for PR 21061 at commit 
[`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21061
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/21078
  
Thanks for thinking through the optional logging issue!  I responded in the 
JIRA to preserve the design discussion there: 
https://issues.apache.org/jira/browse/SPARK-23990


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread aviv-ebates

Github user aviv-ebates closed the pull request at:

https://github.com/apache/spark/pull/21057


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21057
  
Right, let me try to cherry-pick and see if I can write a test. Will try to 
have some time and open a PR after cherry-picking your commit. I think you can 
close this PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming] Update query status ...

2018-04-17 Thread jose-torres

Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21063
  
The approach looks good to me, but we probably want to add some tests to 
StreamingQueryStatusAndProgressSuite. (See test("basic") in ContinuousSuite for 
how to set up a continuous processing memory stream, and note that 
processAllAvailable() won't work properly for continuous execution - you'll 
want to use CheckAnswer to await the added data and Execute to do the 
test-specific progress checks)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler ...

2018-04-17 Thread susanxhuynh

Github user susanxhuynh commented on a diff in the pull request:

https://github.com/apache/spark/pull/20641#discussion_r182271196
  
--- Diff: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ---
@@ -199,6 +199,38 @@ class MesosClusterSchedulerSuite extends SparkFunSuite 
with LocalSparkContext wi
 })
   }
 
+  test("properly wraps and escapes parameters passed to driver command") {
--- End diff --

Sorry for the delay. I was going to test this in DC/OS and haven't gotten a 
chance to do so.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182269723
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -123,7 +128,21 @@ class KMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val predictUDF = udf((vector: Vector) => predict(vector))
+// val predictUDF = udf((vector: Vector) => predict(vector))
+val predictUDF = if 
(dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) {
+  udf((vector: Vector) => predict(vector))
+}
+else {
+  udf((vector: Seq[_]) => {
+val featureArray = Array.fill[Double](vector.size)(0.0)
--- End diff --

Here's what I meant:
```
val predictUDF = featuresDataType match {
  case _: VectorUDT =>
udf((vector: Vector) => predict(vector))
  case fdt: ArrayType => fdt.elementType match {
case _: FloatType =>
  ???
case _: DoubleType =>
  ???
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182269644
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -123,7 +128,21 @@ class KMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val predictUDF = udf((vector: Vector) => predict(vector))
+// val predictUDF = udf((vector: Vector) => predict(vector))
+val predictUDF = if 
(dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) {
+  udf((vector: Vector) => predict(vector))
--- End diff --

Side note: I realized that "predict" will cause the whole model to be 
serialized and sent to workers.  But that's actually OK since we do need to 
send most of the model data to make predictions and since there's not a clean 
way to just sent the model weights.  So I think my previous comment about 
copying "numClasses" to a local variable was not necessary.  Don't bother 
reverting the change though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21029: [SPARK-23952] remove type parameter in DataReader...

2018-04-17 Thread jose-torres

Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/21029#discussion_r182267906
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -95,21 +77,29 @@ case class DataSourceV2ScanExec(
   
sparkContext.getLocalProperty(ContinuousExecution.EPOCH_COORDINATOR_ID_KEY),
   sparkContext.env)
 .askSync[Unit](SetReaderPartitions(readerFactories.size))
-  new ContinuousDataSourceRDD(sparkContext, sqlContext, 
readerFactories)
-.asInstanceOf[RDD[InternalRow]]
-
-case r: SupportsScanColumnarBatch if r.enableBatchRead() =>
-  new DataSourceRDD(sparkContext, 
batchReaderFactories).asInstanceOf[RDD[InternalRow]]
-
+  if (readerFactories.exists(_.dataFormat() == 
DataFormat.COLUMNAR_BATCH)) {
+throw new IllegalArgumentException(
+  "continuous stream reader does not support columnar read yet.")
--- End diff --

I've thought about this further. Shouldn't it be trivial to write a wrapper 
that simply converts a DataReader[ColumnarBatch] to a DataReader[InternalRow]? 
If so then we can easily support it after the current PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-17 Thread henryr

Github user henryr commented on the issue:

https://github.com/apache/spark/pull/21070
  
@scottcarey I agree that's important. Perhaps it could be done as a 
follow-up PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21089
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21089
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89471/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21089
  
**[Test build #89471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89471/testReport)**
 for PR 21089 at commit 
[`1458077`](https://github.com/apache/spark/commit/1458077cf6817701d74fcebd2e83ab6a62889fd8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...

2018-04-17 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21090#discussion_r182254888
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.clustering
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.clustering.{PowerIterationClustering => 
MLlibPowerIterationClustering}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.types._
+
+/**
+ * Common params for PowerIterationClustering
+ */
+private[clustering] trait PowerIterationClusteringParams extends Params 
with HasMaxIter
+  with HasPredictionCol {
+
+  /**
+   * The number of clusters to create (k). Must be  1. Default: 2.
+   * @group param
+   */
+  @Since("2.4.0")
+  final val k = new IntParam(this, "k", "The number of clusters to create. 
" +
+"Must be > 1.", ParamValidators.gt(1))
+
+  /** @group getParam */
+  @Since("2.4.0")
+  def getK: Int = $(k)
+
+  /**
+   * Param for the initialization algorithm. This can be either "random" 
to use a random vector
+   * as vertex properties, or "degree" to use a normalized sum of 
similarities with other vertices.
+   * Default: random.
+   * @group expertParam
+   */
+  @Since("2.4.0")
+  final val initMode = {
+val allowedParams = ParamValidators.inArray(Array("random", "degree"))
+new Param[String](this, "initMode", "The initialization algorithm. 
This can be either " +
+  "'random' to use a random vector as vertex properties, or 'degree' 
to use a normalized sum " +
+  "of similarities with other vertices.  Supported options: 'random' 
and 'degree'.",
+  allowedParams)
+  }
+
+  /** @group expertGetParam */
+  @Since("2.4.0")
+  def getInitMode: String = $(initMode)
+
+  /**
+   * Param for the name of the input column for vertex IDs.
+   * Default: "id"
+   * @group param
+   */
+  @Since("2.4.0")
+  val idCol = new Param[String](this, "idCol", "Name of the input column 
for vertex IDs.",
+(value: String) => value.nonEmpty)
+
+  setDefault(idCol, "id")
+
+  /** @group getParam */
+  @Since("2.4.0")
+  def getIdCol: String = getOrDefault(idCol)
+
+  /**
+   * Param for the name of the input column for neighbors in the adjacency 
list representation.
+   * Default: "neighbors"
+   * @group param
+   */
+  @Since("2.4.0")
+  val neighborsCol = new Param[String](this, "neighborsCol",
+"Name of the input column for neighbors in the adjacency list 
representation.",
+(value: String) => value.nonEmpty)
+
+  setDefault(neighborsCol, "neighbors")
+
+  /** @group getParam */
+  @Since("2.4.0")
+  def getNeighborsCol: String = $(neighborsCol)
+
+  /**
+   * Param for the name of the input column for neighbors in the adjacency 
list representation.
+   * Default: "similarities"
+   * @group param
+   */
+  @Since("2.4.0")
+  val similaritiesCol = new Param[String](this, "similaritiesCol",
+"Name of the input column for neighbors in the adjacency list 
representation.",
+(value: String) => value.nonEmpty)
+
+  setDefault(similaritiesCol, "similarities")
+
+  /** @group getParam */
+  @Since("2.4.0")
+  def getSimilaritiesCol: String = $(similaritiesCol)
+
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnTypes(schema, $(idCol), Seq(IntegerType, 
LongType))
+SchemaUtils.checkColumnTypes(schema,

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/21090
  
Take a quick look. Despite of the style failure and a minor format issue, 
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...

2018-04-17 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21090#discussion_r182243819
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
 ---
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
+import org.apache.spark.{SparkException, SparkFunSuite}
+
+
+class PowerIterationClusteringSuite extends SparkFunSuite
+  with MLlibTestSparkContext with DefaultReadWriteTest {
+
+  @transient var data: Dataset[_] = _
+  final val r1 = 1.0
+  final val n1 = 10
+  final val r2 = 4.0
+  final val n2 = 40
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+data = PowerIterationClusteringSuite.generatePICData(spark, r1, r2, 
n1, n2)
+  }
+
+  test("default parameters") {
+val pic = new PowerIterationClustering()
+
+assert(pic.getK === 2)
+assert(pic.getMaxIter === 20)
+assert(pic.getInitMode === "random")
+assert(pic.getPredictionCol === "prediction")
+assert(pic.getIdCol === "id")
+assert(pic.getNeighborsCol === "neighbors")
+assert(pic.getSimilaritiesCol === "similarities")
+  }
+
+  test("parameter validation") {
+intercept[IllegalArgumentException] {
+  new PowerIterationClustering().setK(1)
+}
+intercept[IllegalArgumentException] {
+  new PowerIterationClustering().setInitMode("no_such_a_mode")
+}
+intercept[IllegalArgumentException] {
+  new PowerIterationClustering().setIdCol("")
+}
+intercept[IllegalArgumentException] {
+  new PowerIterationClustering().setNeighborsCol("")
+}
+intercept[IllegalArgumentException] {
+  new PowerIterationClustering().setSimilaritiesCol("")
+}
+  }
+
+  test("power iteration clustering") {
+val n = n1 + n2
+
+val model = new PowerIterationClustering()
+  .setK(2)
+  .setMaxIter(40)
+val result = model.transform(data)
+
+val predictions = Array.fill(2)(mutable.Set.empty[Long])
+result.select("id", "prediction").collect().foreach {
+  case Row(id: Long, cluster: Integer) => predictions(cluster) += id
+}
+assert(predictions.toSet == Set((1 until n1).toSet, (n1 until 
n).toSet))
+
+val result2 = new PowerIterationClustering()
+  .setK(2)
+  .setMaxIter(10)
+  .setInitMode("degree")
+  .transform(data)
+val predictions2 = Array.fill(2)(mutable.Set.empty[Long])
+result2.select("id", "prediction").collect().foreach {
+  case Row(id: Long, cluster: Integer) => predictions2(cluster) += id
+}
+assert(predictions2.toSet == Set((1 until n1).toSet, (n1 until 
n).toSet))
+  }
+
+  test("supported input types") {
+val model = new PowerIterationClustering()
+  .setK(2)
+  .setMaxIter(1)
+
+def runTest(idType: DataType, neighborType: DataType, similarityType: 
DataType): Unit = {
+  val typedData = data.select(
+col("id").cast(idType).alias("id"),
+col("neighbors").cast(ArrayType(neighborType, containsNull = 
false)).alias("neighbors"),
+col("similarities").cast(ArrayType(similarityType, containsNull = 
false))
+  .alias("similarities")
+  )
+  model.transform(typedData).collect()
+}
+
+for (idType <- Seq(IntegerType, LongType)) {
+  runTest(idType, LongType, DoubleType)
+}
+for (neighborType <- Seq(IntegerType, LongType)) {
+  runTest(LongType,

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89467/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19881
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19881
  
**[Test build #89467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89467/testReport)**
 for PR 19881 at commit 
[`15732ab`](https://github.com/apache/spark/commit/15732ab7ee22a9cc4409b36812b5b2c930854723).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89470/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89470/testReport)**
 for PR 21061 at commit 
[`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class ArraySetUtils extends BinaryExpression with 
ExpectsInputTypes `
  * `case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetUtils `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #89473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89473/testReport)**
 for PR 21073 at commit 
[`44137cc`](https://github.com/apache/spark/commit/44137cc9a9949b4218d973dc46d905d3ce301bcd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89468/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89468/testReport)**
 for PR 21061 at commit 
[`809621b`](https://github.com/apache/spark/commit/809621b9b73b67ebd8fb5ffcf1956fc6dc98be43).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89466/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89466/testReport)**
 for PR 21061 at commit 
[`26c30b9`](https://github.com/apache/spark/commit/26c30b954ad65b5bb41633afb62a8953b5a6a31a).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21085: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-17 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21085
  
known flaky test https://issues.apache.org/jira/browse/SPARK-23894

merging to branch 2.3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182216309
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
* @return output schema
*/
   protected def validateAndTransformSchema(schema: StructType): StructType 
= {
-SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT)
+val typeCandidates = List( new VectorUDT,
+  new ArrayType(DoubleType, true),
--- End diff --

Thinking about this, let's actually disallow nullable columns.  KMeans 
won't handle nulls properly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182216415
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
* @return output schema
*/
   protected def validateAndTransformSchema(schema: StructType): StructType 
= {
-SchemaUtils.checkColumnType(schema, $(featuresCol), new VectorUDT)
+val typeCandidates = List( new VectorUDT,
+  new ArrayType(DoubleType, true),
--- End diff --

Also, IntelliJ may warn you about passing boolean arguments as named 
arguments; that'd be nice to fix here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182215434
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -123,7 +128,21 @@ class KMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val predictUDF = udf((vector: Vector) => predict(vector))
+// val predictUDF = udf((vector: Vector) => predict(vector))
+val predictUDF = if 
(dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) {
+  udf((vector: Vector) => predict(vector))
+}
--- End diff --

Scala style: } else {


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182217722
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -123,7 +128,21 @@ class KMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val predictUDF = udf((vector: Vector) => predict(vector))
+// val predictUDF = udf((vector: Vector) => predict(vector))
+val predictUDF = if 
(dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) {
+  udf((vector: Vector) => predict(vector))
+}
+else {
+  udf((vector: Seq[_]) => {
+val featureArray = Array.fill[Double](vector.size)(0.0)
--- End diff --

You shouldn't have to do the conversion in this convoluted (and less 
efficient) way.  I'd recommend doing a match-case statement on dataset.schema; 
I think that will be the most succinct.  Then you can handle Vector, Seq of 
Float, and Seq of Double separately, without conversions to strings.

Same for the similar cases below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21081#discussion_r182215639
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -123,7 +128,21 @@ class KMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val predictUDF = udf((vector: Vector) => predict(vector))
+// val predictUDF = udf((vector: Vector) => predict(vector))
+val predictUDF = if 
(dataset.schema($(featuresCol)).dataType.equals(new VectorUDT)) {
+  udf((vector: Vector) => predict(vector))
+}
+else {
+  udf((vector: Seq[_]) => {
--- End diff --

scala style: remove unnecessary ```{``` at end of line (IntelliJ should 
warn you about this)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-17 Thread attilapiros

Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/21068
  
Yes we can create an abstract class from `YarnAllocatorBlacklistTracker` 
(like `AbstractAllocatorBlacklistTracker`) where the method 
`synchronizeBlacklistedNodes` can have different implementations. In this case 
the core and the messages can stay as it is. As I see this is the less risky 
and cheaper solution. On the other hand having the complete blacklisting in the 
driver has a more centralized/clear design. 

We just have to make our mind where to go from here. Any help and 
suggestions are welcomed for the decision.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-17 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15770
  
OK sorry to push @wangmiao1981 !  I just want to make sure this gets in 
before I no longer have bandwidth for it.  If you have the time, would you mind 
checking the updates I made in the new PR?   Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21090
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21090
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2404/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21090
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89472/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21090
  
**[Test build #89472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89472/testReport)**
 for PR 21090 at commit 
[`d215748`](https://github.com/apache/spark/commit/d2157489770a79fe443d567bfc03d61f72fbe161).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21090
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/21090
  
@wangmiao1981 and @WeichenXu123 would you mind taking a look?  Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21090
  
**[Test build #89472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89472/testReport)**
 for PR 21090 at commit 
[`d215748`](https://github.com/apache/spark/commit/d2157489770a79fe443d567bfc03d61f72fbe161).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/21090
  
**To review this PR**: This was copied from 
https://github.com/apache/spark/pull/15770 with the following changes:
* Addressed comments in original PR  (See my review comments there)
* Added Param validators for required input columns
* Renamed âweightsâ column to âsimilaritiesâ
* Made algorithm take more types of inputs: Long/Int and Double/Float
* Removed test("set parameters") since setters are already tested in the 
read/write test.

If you saw the previous PR, you should be able to review this one based on 
the last 3 commits, viewable in this diff: 
https://github.com/jkbradley/spark/compare/5cb8ed6...wangmiao1981-pic


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...

2018-04-17 Thread jkbradley

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/21090

[SPARK-15784][ML] Add Power Iteration Clustering to spark.ml

## What changes were proposed in this pull request?

This PR adds PowerIterationClustering as a Transformer to spark.ml.  In the 
transform method, it calls spark.mllib's PowerIterationClustering.run() method 
and transforms the return value assignments (the Kmeans output of the 
pseudo-eigenvector) as a DataFrame (id: LongType, cluster: IntegerType).

This PR is copied and modified from 
https://github.com/apache/spark/pull/15770  The primary author is @wangmiao1981 

## How was this patch tested?

This PR has 2 types of tests:
* Copies of tests from spark.mllib's PIC tests
* New tests specific to the spark.ml APIs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark wangmiao1981-pic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21090


commit e4492a64b74b0ccc2da8f13353d37bb9bb0c
Author: wm...@hotmail.com 
Date:   2016-06-13T19:47:42Z

add pic framework (model, class etc)

commit 70862491e5b86ce4add500a0c96ae5220733b35d
Author: wm...@hotmail.com 
Date:   2016-06-13T23:28:09Z

change a comment

commit b73d8a78fa69f83c278996feb1b19502ef871c5b
Author: wm...@hotmail.com 
Date:   2016-06-17T17:27:55Z

add missing functions fit predict load save etc.

commit 022fe523f735c5519f948b175871489f79434fb5
Author: wm...@hotmail.com 
Date:   2016-06-18T01:12:41Z

add unit test flie

commit 552cf54fb03f88af023f080e60fa50f1f39060fc
Author: wm...@hotmail.com 
Date:   2016-06-20T17:35:05Z

add test cases part 1

commit 0b4954d55b4d344794d3c47366220c67f07d0d43
Author: wm...@hotmail.com 
Date:   2016-06-20T20:29:54Z

add unit test part 2: test fit, parameters etc.

commit f22b01e06eaaf5951befcebdffc18c8e519183d2
Author: wm...@hotmail.com 
Date:   2016-06-20T21:22:59Z

fix a type issue

commit 305b194dae40eaff990c18837c3f2bc8d469e60c
Author: wm...@hotmail.com 
Date:   2016-06-21T20:07:27Z

add more unit tests

commit 4b32cbf02965c5c1a0c094fa144836dab0dfd543
Author: wm...@hotmail.com 
Date:   2016-06-21T21:46:25Z

delete unused import and add comments

commit f6eda88a6c0af416b988a2c37f46c8b08e5e99cf
Author: wm...@hotmail.com 
Date:   2016-10-25T21:28:12Z

change version to 2.1.0

commit 45c4b1cd1afa28c775c666b57ecee614ed9a41d0
Author: wm...@hotmail.com 
Date:   2016-11-03T23:26:01Z

change PIC as a Transformer

commit e8d7ed37138909d010a812fba7d03ef30a4f6e05
Author: wm...@hotmail.com 
Date:   2016-11-04T17:28:26Z

add LabelCol

commit e4e1e055a9b3ab54b83331ac7dc56d6b792dcf7b
Author: wm...@hotmail.com 
Date:   2016-11-04T18:36:09Z

change col implementation

commit 8384422ec0e7192cc8ce53df02ddb4ae0401fd0b
Author: wm...@hotmail.com 
Date:   2017-02-17T22:20:00Z

address some of the comments

commit d6a199c48ff940861d80caf275da29d99375ce33
Author: wm...@hotmail.com 
Date:   2017-02-21T22:37:51Z

add additional test with dataset having more data

commit b0c3aff4a76ace99c104c2b2c10c9485a028bfd6
Author: wm...@hotmail.com 
Date:   2017-03-14T23:13:45Z

change input data format

commit 091225dd2f1c353edc28dc4299034a018a92bc81
Author: wm...@hotmail.com 
Date:   2017-03-15T22:49:45Z

resolve warnings

commit 8bb99567556ce29c75d5f395157d0161dff695bc
Author: wm...@hotmail.com 
Date:   2017-03-16T18:33:47Z

add neighbor and weight cols

commit 8ba82e8392e6d607ab750ed8eb3caaf386e1352a
Author: wangmiao1981 
Date:   2017-08-15T21:13:55Z

address review comments 1

commit 468a94741efe6530c9acfbb1af4f46499550ee1f
Author: wangmiao1981 
Date:   2017-08-15T21:23:39Z

fix style

commit ec10f24336ff51354a1657c7ceadb9ada8cd1484
Author: wangmiao1981 
Date:   2017-08-15T22:30:28Z

remove unused comments

commit 5710cfcf2e3596c95f353ce043f7358a030d70a0
Author: wangmiao1981 
Date:   2017-08-15T23:43:14Z

add Since

commit 88654b3055ebd863e3b3c5774abdce28f3cda184
Author: wangmiao1981 
Date:   2017-08-17T00:12:12Z

fix missing >

commit 804adc6fece91e7264f315ee965faa40c5e334c5
Author: wangmiao1981 
Date:   2017-08-17T17:26:40Z

fix doc

commit 4a6dd79a9c37f71ea4378692438f19b3247b7913
Author: wangmiao1981 
Date:   2017-10-25T23:16:55Z

address review comments

commit 5cb8ed6de3865f58719b3b30888b3bc4542905d4
Author: wangmiao1981 
Date:   2017-10-30T21:44:24Z

fix unit test

commit

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-17 Thread maryannxue

Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/21083
  
Thank you for you reply, @cloud-fan! I was not clear when you had become 
aware of the effort on SPARK-21479 so it might be a misunderstanding on my side 
and I apologize. Anyway, if you had had a closer look at the PR, you would have 
probably got the idea that it's basically the same approach as what you have 
here, only that you have covered more join types.
Here's another note. There's two types of constraint-to-filter inference 
for joins going on here:
1. Infer from the Join node constraints, which is covered by the 
`PushPredicateThroughJoin` rule;
2. Infer from the sibling child node combined with the join condition, 
which is what you've added here.
That said, the InnerLike joins should already be covered by 1 and might not 
be worth being considered again in this optimization rule. Not sure about 
LeftSemi joins, so it would be nice if we could have a test case that proves 
this optimization does something that has not yet been covered by 
`PushPredicateThroughJoin` rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2403/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21089
  
**[Test build #89471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89471/testReport)**
 for PR 21089 at commit 
[`1458077`](https://github.com/apache/spark/commit/1458077cf6817701d74fcebd2e83ab6a62889fd8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r182200622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -417,3 +418,156 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+abstract class ArraySetUtils extends BinaryExpression with 
ExpectsInputTypes {
+  val kindUnion = 1
+  def typeId: Int
+
+  def array1: Expression
+  def array2: Expression
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val r = super.checkInputDataTypes()
+if ((r == TypeCheckResult.TypeCheckSuccess) &&
+  (array1.dataType.asInstanceOf[ArrayType].elementType !=
+array2.dataType.asInstanceOf[ArrayType].elementType)) {
+  TypeCheckResult.TypeCheckFailure("Element type in both arrays must 
be the same")
+} else {
+  r
+}
+  }
+
+  override def dataType: DataType = array1.dataType
+
+  private def elementType = dataType.asInstanceOf[ArrayType].elementType
+  private def cn1 = array1.dataType.asInstanceOf[ArrayType].containsNull
+  private def cn2 = array2.dataType.asInstanceOf[ArrayType].containsNull
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val ary1 = input1.asInstanceOf[ArrayData]
+val ary2 = input2.asInstanceOf[ArrayData]
+
+if (!cn1 && !cn2) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  val hs = new OpenHashSet[Int]
+  var i = 0
+  while (i < ary1.numElements()) {
+hs.add(ary1.getInt(i))
+i += 1
+  }
+  i = 0
+  while (i < ary2.numElements()) {
--- End diff --

We can also support `array_union` and `array_except` by changing this 2nd 
loop with small other changes. This is why we introduced `ArraySetUtils` in 
this PR.

Other PRs will update `ArraySetUtils` appropriately.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89470/testReport)**
 for PR 21061 at commit 
[`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21089
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89469/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21089
  
**[Test build #89469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89469/testReport)**
 for PR 21089 at commit 
[`4391b0a`](https://github.com/apache/spark/commit/4391b0a94e7ddbf53043e6723b7e6e63655f5759).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21089
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21089
  
**[Test build #89469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89469/testReport)**
 for PR 21089 at commit 
[`4391b0a`](https://github.com/apache/spark/commit/4391b0a94e7ddbf53043e6723b7e6e63655f5759).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21089: [SPARK-24004] Test of from_json for non-root MapType

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21089
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21089: [SPARK-24004] Test of from_json for non-root MapT...

2018-04-17 Thread MaxGekk

GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/21089

[SPARK-24004] Test of from_json for non-root MapType

## What changes were proposed in this pull request?

New test checks that from_json is able to parse json contains MapType as a 
value type of struct fields. The test required adding of the equals and 
hashCode methods to ArrayBasedMapData to compare returned result to expected 
value.

## How was this patch tested?

Added comparison tests for ArrayBasedMapData and a test for from_json.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 from_json-map-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21089


commit d91d93936ce1c12765dffdbf2fe773585dd39eae
Author: Maxim Gekk 
Date:   2018-04-17T18:15:44Z

Added a test for checking from_json: json -> struct of map

commit 428c6ba93202bf3235667794f9f8055c432733e1
Author: Maxim Gekk 
Date:   2018-04-17T18:16:54Z

Added a test for comparison of ArrayBasedMapData

commit 58d1f7e41883b5e8924eca83bcf72deedadaf0ef
Author: Maxim Gekk 
Date:   2018-04-17T19:01:00Z

Implemented the equals and hashCode methods of ArrayBasedMapData

commit 4391b0a94e7ddbf53043e6723b7e6e63655f5759
Author: Maxim Gekk 
Date:   2018-04-17T19:21:49Z

Improving test title




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89464/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21086
  
**[Test build #89464 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89464/testReport)**
 for PR 21086 at commit 
[`3bb4824`](https://github.com/apache/spark/commit/3bb4824225b53f0ee7900835bfc99b9bd01f7d4f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20998
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89460/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20998
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20998
  
**[Test build #89460 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89460/testReport)**
 for PR 20998 at commit 
[`0c6f305`](https://github.com/apache/spark/commit/0c6f3058a5c0af4a6e9cd1a90d43230805305df5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-17 Thread wangmiao1981

Github user wangmiao1981 closed the pull request at:

https://github.com/apache/spark/pull/15770


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-17 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15770
  
@jkbradley I close this one now. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-17 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15770
  
@jkbradley Sorry for missing your comments. Anyway, I will close it now. I 
will choose another one to work on. Thanks! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21088: [SPARK-24003][CORE] Add support to provide spark....

2018-04-17 Thread devaraj-kavali

GitHub user devaraj-kavali opened a pull request:

https://github.com/apache/spark/pull/21088

[SPARK-24003][CORE] Add support to provide spark.executor.extraJavaOptions 
in terms of App Id and/or Executor Id's

## What changes were proposed in this pull request?

Added support to specify the 'spark.executor.extraJavaOptions' value in 
terms of the `{{APP_ID}}` and/or `{{EXECUTOR_ID}}`,  `{{APP_ID}}` will be 
replaced by Application Id and `{{EXECUTOR_ID}}` will be replaced by Executor 
Id while starting the executor.

## How was this patch tested?

I have verified this by checking the executor process command and gc logs. 
I verified the same in different deployment modes(Standalone, YARN, Mesos) 
client and cluster modes.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/devaraj-kavali/spark SPARK-24003

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21088


commit ff6e79315c0ea26b45207753d3e79f96f2395329
Author: Devaraj K 
Date:   2018-04-17T18:37:11Z

[SPARK-24003][CORE] Add support to provide spark.executor.extraJavaOptions
in terms of App Id and/or Executor Id's




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21082
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21082
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89462/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python][WIP] Enable grouped aggregate...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21082
  
**[Test build #89462 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89462/testReport)**
 for PR 21082 at commit 
[`dfdb03f`](https://github.com/apache/spark/commit/dfdb03fc9c19b120a046b45b0864f8f405b0ead5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20787
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89465/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20787
  
**[Test build #89465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89465/testReport)**
 for PR 20787 at commit 
[`913eed8`](https://github.com/apache/spark/commit/913eed849d218a34d6e28a2500301320ba21cecc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-17 Thread devaraj-kavali

Github user devaraj-kavali commented on the issue:

https://github.com/apache/spark/pull/21071
  
@gatorsmile we need to have this for K8S as well, will include it in SPIP.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21074
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89459/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21074
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21074
  
**[Test build #89459 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89459/testReport)**
 for PR 21074 at commit 
[`571912f`](https://github.com/apache/spark/commit/571912f3ed21cd3753fa76225f88d0f6d8298989).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2402/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-17 Thread scottcarey

Github user scottcarey commented on the issue:

https://github.com/apache/spark/pull/21070
  
This PR should include changes to `ParquetOptions.scala` to expose the 
`LZ4`, `ZSTD` and `BROTLI` parquet compression codecs, or else spark users 
won't be able to leverage those parquet 1.10.0 features.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21071
  
@devaraj-kavali How about K8S?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89468/testReport)**
 for PR 21061 at commit 
[`809621b`](https://github.com/apache/spark/commit/809621b9b73b67ebd8fb5ffcf1956fc6dc98be43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21071
  
cc @jiangxb1987 @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-17 Thread devaraj-kavali

Github user devaraj-kavali commented on the issue:

https://github.com/apache/spark/pull/21071
  
Thanks @rxin and @markhamstra for your comments, I will come up with SPIP 
design draft and start the discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming] Update query status ...

2018-04-17 Thread efimpoberezkin

Github user efimpoberezkin commented on the issue:

https://github.com/apache/spark/pull/21063
  
Hi @jose-torres, I made some changes to this PR according to your comment, 
could you review it please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89456/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89461/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21029
  
**[Test build #89461 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89461/testReport)**
 for PR 21029 at commit 
[`1ae4b6d`](https://github.com/apache/spark/commit/1ae4b6dd8d07fa3b095dbbc6c435c337f71bd0bd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #89456 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89456/testReport)**
 for PR 20858 at commit 
[`f2a67e8`](https://github.com/apache/spark/commit/f2a67e82880896bf7c09d3067f7d1699c43d2505).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15113: [SPARK-17508][PYSPARK][ML] PySpark treat Param values No...

2018-04-17 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15113
  
I still think this makes sense, but maybe I'm the minority.  I'll go ahead 
and close it unless anyone else thinks it should be changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15113: [SPARK-17508][PYSPARK][ML] PySpark treat Param va...

2018-04-17 Thread BryanCutler

Github user BryanCutler closed the pull request at:

https://github.com/apache/spark/pull/15113


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

2018-04-17 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/19881
  
Thanks @jcuquemelle 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

101 - 200 of 499 matches

Mail list logo