[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21165
  
I think we can just update MimaExcludes, since it's developer API. cc 
@JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21114
  
can we do this?
```
var acc = ...
... // launch a long running job
val accId = acc.getId
acc = null
gc
... // job finished
```

accumulator is created by users so we have to be prepared for any 
situations. That's why we use weak reference at the first place. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21322
  
I think users are responsible to call `Broadcast#destroy`, which unpersist 
broadcast blocks from block manager and run user-defined driver side cleanup.

It is a valid use case to allow users to define some executor side cleanup 
via `AutoCloseable`. However, I don't think we should always detect 
`AutoCloseable` when removing a block, as it may break existing program and 
cause perf regression. We should only do it for broadcast blocks.

A good place to do it seems to be `BlockManager.removeBroadcast`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2018-05-14 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19840
  
@vanzin I am not very familiar with python part 
[context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191),
 so handle it at `api/python/PythonRunner` as I did in this pr. 

Maybe someone else could help, sorry for the delay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3214/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21267
  
**[Test build #90616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)**
 for PR 21267 at commit 
[`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3213/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21267
  
**[Test build #90614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)**
 for PR 21267 at commit 
[`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20800
  
**[Test build #90615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90615/testReport)**
 for PR 20800 at commit 
[`f30d3ec`](https://github.com/apache/spark/commit/f30d3ec95c0d00f409f6536d10710b2f65fad787).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r188144573
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+self._python_includes.append(filename)
+sys.path.insert(1, filepath)
+except Exception:
+from pyspark import util
+warnings.warn(
--- End diff --

Likewise, I checked the warning manually:

```
.../pyspark/context.py:229: RuntimeWarning: Failed to add file 
[/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path:

...
  /usr/lib64/python27.zip
  /usr/lib64/python2.7
... 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20800
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20800
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread sohama4
Github user sohama4 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21316#discussion_r188143976
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
   @Experimental
   @InterfaceStability.Evolving
-  def reduce(func: (T, T) => T): T = rdd.reduce(func)
+  def reduce(func: (T, T) => T): T = withNewExecutionId {
--- End diff --

Thanks, that makes sense when I looked at the code for `foreach` and 
`foreachPartition`; I put up a new version with this change. It however wasn't 
clear immediately how the new function `withNewRDDExecutionId` would be 
beneficial over `withNewExecutionId`, can you elaborate a little when you get 
the chance?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r188143390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a 
non-null element present also in a2. If the arrays have no common element and 
they are both non-empty and either of them contains a null element null is 
returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+fastEval _
+  } else {
+bruteForceEval _
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller) = if (arr1.numElements() > arr2.numElements()) {
+  (arr1, arr2)
+} else {
+  (arr2, arr1)
+}
+if (smaller.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smaller.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  bigger.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  /**
+   * A slower evaluation which performs a nested loop and supports all the 
data types.
+   */
+  private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+if (arr1.numElements() > 0) {
+  arr1.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else {
+  arr2.foreach(elementType, (_, v2) =>
+if (v1 == null) {
+  hasNull = true
+} else if (ordering.equiv(v1, v2)) {
+  return true
+}
+  )
+})
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (a1, a2) => {
+  val smaller = ctx.freshName("smallerArray")
+  val bigger = ctx.freshName("biggerArray")
+  val comparisonCode = if (elementTypeSupportEquals) {
+fastCodegen(ctx, ev, smaller, bigger)
+  } else {
+bruteForceCodegen(ctx, ev, smaller, bigger)
+  }
+  s"""
+ |ArrayData $smaller;
+ |ArrayData $bigger;
+ |if ($a1.numElements() > $a2.numElements()) {
+ 

[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21183
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90611/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21183
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21183
  
**[Test build #90611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90611/testReport)**
 for PR 21183 at commit 
[`7ee0ebf`](https://github.com/apache/spark/commit/7ee0ebf028e41719514c0588d378cb515aea744a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90608/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21323
  
**[Test build #90608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90608/testReport)**
 for PR 21323 at commit 
[`56437da`](https://github.com/apache/spark/commit/56437da708fc12d2c9216a1365a8afd6f81af845).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #90613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90613/testReport)**
 for PR 21221 at commit 
[`10ed328`](https://github.com/apache/spark/commit/10ed328bfcf160711e7619aac23472f97bf1c976).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #90612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90612/testReport)**
 for PR 21153 at commit 
[`dc59375`](https://github.com/apache/spark/commit/dc593754c62d2daf89331ea21d9250af9b9febfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90612/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-14 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r188136553
  
--- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.TimeUnit
+
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Creates a heartbeat thread which will call the specified 
reportHeartbeat function at
+ * intervals of intervalMs.
+ *
+ * @param reportHeartbeat the heartbeat reporting function to call.
+ * @param intervalMs the interval between heartbeats.
+ */
+private[spark] class Heartbeater(reportHeartbeat: () => Unit, intervalMs: 
Long) {
+  // Executor for the heartbeat task
+  private val heartbeater = 
ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-heartbeater")
--- End diff --

Changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-14 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r188136532
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1753,9 +1766,21 @@ class DAGScheduler(
 messageScheduler.shutdownNow()
 eventProcessLoop.stop()
 taskScheduler.stop()
+heartbeater.stop()
+  }
+
+  /** Reports heartbeat metrics for the driver. */
+  private def reportHeartBeat(): Unit = {
--- End diff --

It's a bit redundant for fields that aren't used by the driver -- for the 
driver, execution memory gets set to 0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21312
  
It looks like the `ListVector` also needs `setLastSet` to be called with 0, 
which is only in `ListVector`.  This is fine though, since `ListVector` is the 
only vector extending `BaseRepeatedValueVector`
```
case listVector: ListVector =>
val buffers = listVector.getBuffers(false)
buffers.foreach(buf => buf.setByte(0, buf.capacity()))
listVector.setValueCount(0)
listVector.setLastSet(0)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3212/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21312
  
Ok. I will use manual reset for now and leave a TODO comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21312
  
I'm okay with either way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3211/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread ludatabricks
Github user ludatabricks commented on the issue:

https://github.com/apache/spark/pull/21183
  
I tested to load the old saving models from Spark 2.3. It is ok to load it 
from this. 

For the tests in LDASuite, I do see failing sometimes without this fix. It 
will not always happen. I can remove it if you think it is not necessary. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21183
  
**[Test build #90611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90611/testReport)**
 for PR 21183 at commit 
[`7ee0ebf`](https://github.com/apache/spark/commit/7ee0ebf028e41719514c0588d378cb515aea744a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #90612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90612/testReport)**
 for PR 21153 at commit 
[`dc59375`](https://github.com/apache/spark/commit/dc593754c62d2daf89331ea21d9250af9b9febfd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90610/testReport)**
 for PR 21291 at commit 
[`f93738b`](https://github.com/apache/spark/commit/f93738be3a7509d70568b3060a0cc4dd3ff23da0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r188132284
  
--- Diff: python/pyspark/ml/util.py ---
@@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, 
extraMetadata=None, paramMap=None):
 - sparkVersion
 - uid
 - paramMap
+- defalutParamMap (since 2.4.0)
--- End diff --

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r187129517
  
--- Diff: python/pyspark/ml/util.py ---
@@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, 
extraMetadata=None, paramMap=None):
 - sparkVersion
 - uid
 - paramMap
+- defalutParamMap (since 2.4.0)
--- End diff --

typo: default


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r188131133
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
@@ -39,7 +39,9 @@ class ConfigBehaviorSuite extends QueryTest with 
SharedSQLContext {
 def computeChiSquareTest(): Double = {
   val n = 1
   // Trigger a sort
-  val data = spark.range(0, n, 1, 1).sort('id.desc)
+  // Range has range partitioning in its output now. To have a range 
shuffle, we
+  // need to run a repartition first.
+  val data = spark.range(0, n, 1, 1).repartition(10).sort('id.desc)
--- End diff --

This test requires a range shuffle. Previously `range` has unknown output 
partitioning/ordering, so there is a range shuffle inserted before `sort`.

For now `range` has an ordered output, so planner doesn't insert the 
shuffle we need here.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r188130563
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -621,6 +621,25 @@ class PlannerSuite extends SharedSQLContext {
   requiredOrdering = Seq(orderingA, orderingB),
   shouldHaveSort = true)
   }
+
+  test("SPARK-24242: RangeExec should have correct output ordering and 
partitioning") {
+val df = spark.range(10)
+val rangeExec = df.queryExecution.executedPlan.collect {
+  case r: RangeExec => r
+}
+val range = df.queryExecution.optimizedPlan.collect {
+  case r: Range => r
+}
+assert(rangeExec.head.outputOrdering == range.head.outputOrdering)
+assert(rangeExec.head.outputPartitioning ==
+  RangePartitioning(rangeExec.head.outputOrdering, 
df.rdd.getNumPartitions))
+
+val rangeInOnePartition = spark.range(1, 10, 1, 1)
+val rangeExecInOnePartition = 
rangeInOnePartition.queryExecution.executedPlan.collect {
+  case r: RangeExec => r
+}
+assert(rangeExecInOnePartition.head.outputPartitioning == 
SinglePartition)
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21239
  
**[Test build #90609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90609/testReport)**
 for PR 21239 at commit 
[`41577c3`](https://github.com/apache/spark/commit/41577c35a7c59ffcf48225fbc30b0dc3c8cab674).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-14 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21239
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-14 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21239
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...

2018-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21322#discussion_r188128515
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -384,15 +385,36 @@ private[spark] class MemoryStore(
 }
   }
 
+  private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = {
+entry match {
+  case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
+  case DeserializedMemoryEntry(objs: Array[Any], _, _) => 
maybeCloseValues(objs)
+  case _ =>
+}
+  }
+
+  private def maybeCloseValues(objs: Array[Any]): Unit = {
+objs.foreach {
+case closable: AutoCloseable =>
--- End diff --

indent style: two spaces.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...

2018-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21322#discussion_r188128177
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -384,15 +385,36 @@ private[spark] class MemoryStore(
 }
   }
 
+  private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = {
+entry match {
+  case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
+  case DeserializedMemoryEntry(objs: Array[Any], _, _) => 
maybeCloseValues(objs)
--- End diff --

As I know, broadcasted variables can be serialized on disk too 
(`BlockManager.doPutIterator`). In the case, seems `AutoCloseable` broadcasted 
variables won't hit this release logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21312
  
@BryanCutler I have such thought but wondered if it is good to do that. If 
you @HyukjinKwon @icexelloss are also agreed on manual reset like this, I'm 
fine with it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #4179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4179/testReport)**
 for PR 21153 at commit 
[`ce84137`](https://github.com/apache/spark/commit/ce841372b76fe3263462b1f51ebfda26e098f8f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21199
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90603/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21199
  
**[Test build #90603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90603/testReport)**
 for PR 21199 at commit 
[`b3a42f0`](https://github.com/apache/spark/commit/b3a42f08cba85b9bec11aaa3f75de298aa869204).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `case class ContinuousRecordPartitionOffset(partitionId: Int, offset: 
Int) extends PartitionOffset`
  * `case class GetRecord(offset: ContinuousRecordPartitionOffset)`
  * `class ContinuousRecordEndpoint(buckets: Seq[Seq[Any]], lock: Object)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21324: [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warn...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21324
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90607/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21324: [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warn...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21324
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21324: [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warn...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21324
  
**[Test build #90607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90607/testReport)**
 for PR 21324 at commit 
[`ecd3792`](https://github.com/apache/spark/commit/ecd37927ef122a75bf87f1de16d6afc80fd0bf61).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2018-05-14 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19840
  
@yaooqinn do you plan to update this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #4179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4179/testReport)**
 for PR 21153 at commit 
[`ce84137`](https://github.com/apache/spark/commit/ce841372b76fe3263462b1f51ebfda26e098f8f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90606/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21199
  
**[Test build #90606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90606/testReport)**
 for PR 21199 at commit 
[`b962c3d`](https://github.com/apache/spark/commit/b962c3dbd1715b2d4fa03e65731e36697cf37ff1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90604/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21208
  
**[Test build #90604 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90604/testReport)**
 for PR 21208 at commit 
[`c2ce328`](https://github.com/apache/spark/commit/c2ce328eda03f01b58ef9c52084e671cc6720802).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread pepinoflo
Github user pepinoflo commented on the issue:

https://github.com/apache/spark/pull/21208
  
Any idea about the test failure? Test name is 
`org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a 
sbt.testing.SuiteSelector)`, and error message is 
`java.lang.reflect.InvocationTargetException: null`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/21316#discussion_r188107132
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
   @Experimental
   @InterfaceStability.Evolving
-  def reduce(func: (T, T) => T): T = rdd.reduce(func)
+  def reduce(func: (T, T) => T): T = withNewExecutionId {
--- End diff --

this method should use `withNewRDDExecutionId`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test c...

2018-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21323


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21323
  
Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90602/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3210/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21323
  
**[Test build #90602 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90602/testReport)**
 for PR 21323 at commit 
[`fdcacd8`](https://github.com/apache/spark/commit/fdcacd8868de0aca3d13ae5ca5a9e323f114fab9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withNewExecu...

2018-05-14 Thread sohama4
Github user sohama4 commented on the issue:

https://github.com/apache/spark/pull/21316
  
Thanks for the approval @jaceklaskowski! Can you leave a comment so that 
Jenkins can get testing underway?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21323
  
**[Test build #90608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90608/testReport)**
 for PR 21323 at commit 
[`56437da`](https://github.com/apache/spark/commit/56437da708fc12d2c9216a1365a8afd6f81af845).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21316#discussion_r188104204
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
   @Experimental
   @InterfaceStability.Evolving
-  def reduce(func: (T, T) => T): T = rdd.reduce(func)
+  def reduce(func: (T, T) => T): T = withNewExecutionId {
--- End diff --

Why would we want to deprecate it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23582][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21323
  
@henryr looks like the bugs is wrong (should be SPARK-23852).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21322
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90599/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21322
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-14 Thread henryr
Github user henryr closed the pull request at:

https://github.com/apache/spark/pull/21302


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21322
  
**[Test build #90599 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90599/testReport)**
 for PR 21322 at commit 
[`f254f94`](https://github.com/apache/spark/commit/f254f94fdc5e2648d7c1104bf5ec2355de7c6055).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21108: [SPARK-24027][SQL] Support MapType with StringTyp...

2018-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21108


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-14 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21302
  
Also, please close the PR manually (github doesn't do that for branches).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-14 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21302
  
Merging to 2.3. In the unlikely event of issues, we can address them later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21108
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-...

2018-05-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21300
  
Thanks for your confirmation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-...

2018-05-14 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21300
  
this is ok to me since it's turned off by default


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21208
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21208
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90605/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21208
  
**[Test build #90605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90605/testReport)**
 for PR 21208 at commit 
[`703d254`](https://github.com/apache/spark/commit/703d2547cf715419c1d2eafc3d440e4eb0e7132c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21218


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90600/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21108
  
**[Test build #90600 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90600/testReport)**
 for PR 21108 at commit 
[`768ef5e`](https://github.com/apache/spark/commit/768ef5ee46973d0f578437e489fc9bc622d77831).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21218: [SPARK-24155][ML] Instrumentation improvements for clust...

2018-05-14 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/21218
  
LGTM. Merged into master. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-05-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21183#discussion_r188081089
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -473,7 +475,8 @@ final class OnlineLDAOptimizer extends LDAOptimizer 
with Logging {
 None
   }
 
-val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = 
batch.mapPartitions { docs =>
+val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = 
batch.mapPartitionsWithIndex
--- End diff --

fix scala style:
```
val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = 
batch.mapPartitionsWithIndex {
  (index, docs) =>
val nonEmptyDocs = docs.filter(_._2.numNonzeros > 0)

val stat = BDM.zeros[Double](k, vocabSize)
val logphatPartOption = logphatPartOptionBase()
var nonEmptyDocCount: Long = 0L
nonEmptyDocs.foreach { case (_, termCounts: Vector) =>
  nonEmptyDocCount += 1
  val (gammad, sstats, ids) = 
OnlineLDAOptimizer.variationalTopicInference(
termCounts, expElogbetaBc.value, alpha, gammaShape, k, seed + 
index)
  stat(::, ids) := stat(::, ids) + sstats
  logphatPartOption.foreach(_ += 
LDAUtils.dirichletExpectation(gammad))
}
Iterator((stat, logphatPartOption, nonEmptyDocCount))
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...

2018-05-14 Thread fangshil
Github user fangshil commented on the issue:

https://github.com/apache/spark/pull/21310
  
I will investigate how can we add test for this. thoughts are welcomed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...

2018-05-14 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21180
  
Hey @HyukjinKwon and @felixcheung, do you think the PR is good to be merged 
as-is, or would you like me to think further about how to make it more robust? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r188082738
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
@@ -39,7 +39,9 @@ class ConfigBehaviorSuite extends QueryTest with 
SharedSQLContext {
 def computeChiSquareTest(): Double = {
   val n = 1
   // Trigger a sort
-  val data = spark.range(0, n, 1, 1).sort('id.desc)
+  // Range has range partitioning in its output now. To have a range 
shuffle, we
+  // need to run a repartition first.
+  val data = spark.range(0, n, 1, 1).repartition(10).sort('id.desc)
--- End diff --

sorry, I am just curious, why is `sort('id.desc)` not causing a shuffle? 
Shouldn't it be ordered by `'id.asc` without the sort?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r188081824
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -621,6 +621,25 @@ class PlannerSuite extends SharedSQLContext {
   requiredOrdering = Seq(orderingA, orderingB),
   shouldHaveSort = true)
   }
+
+  test("SPARK-24242: RangeExec should have correct output ordering and 
partitioning") {
+val df = spark.range(10)
+val rangeExec = df.queryExecution.executedPlan.collect {
+  case r: RangeExec => r
+}
+val range = df.queryExecution.optimizedPlan.collect {
+  case r: Range => r
+}
+assert(rangeExec.head.outputOrdering == range.head.outputOrdering)
+assert(rangeExec.head.outputPartitioning ==
+  RangePartitioning(rangeExec.head.outputOrdering, 
df.rdd.getNumPartitions))
+
+val rangeInOnePartition = spark.range(1, 10, 1, 1)
+val rangeExecInOnePartition = 
rangeInOnePartition.queryExecution.executedPlan.collect {
+  case r: RangeExec => r
+}
+assert(rangeExecInOnePartition.head.outputPartitioning == 
SinglePartition)
--- End diff --

should we also add a test case for the 0 partition case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >