[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21114
  
**[Test build #90623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90623/testReport)**
 for PR 21114 at commit 
[`1b1b1fa`](https://github.com/apache/spark/commit/1b1b1fa6c8c89759c4353c843fb67282c2baac6f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21310
  
I will take a look later today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17335: [SPARK-19995][YARN] Register tokens to current UGI to av...

2018-05-14 Thread rajeshcode
Github user rajeshcode commented on the issue:

https://github.com/apache/spark/pull/17335
  
Is this patch will work for spark-sql --master local mode as well.
 
In our environment  localmode is not supporting proxy user where as yarn 
mode looks ok. Do we have a solution for proxy user support on localmode


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...

2018-05-14 Thread fangshil
Github user fangshil commented on the issue:

https://github.com/apache/spark/pull/21310
  
@viirya  @cloud-fan before I add test, could you guys take a look and 
advise if the approach taken in this patch is acceptable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21114: [SPARK-22371][CORE] Return None instead of throwi...

2018-05-14 Thread artemrd
Github user artemrd commented on a diff in the pull request:

https://github.com/apache/spark/pull/21114#discussion_r188170023
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -237,6 +236,65 @@ class AccumulatorSuite extends SparkFunSuite with 
Matchers with LocalSparkContex
 acc.merge("kindness")
 assert(acc.value === "kindness")
   }
+
+  test("updating garbage collected accumulators") {
--- End diff --

I agree this test is quite ugly. Let me just revert the last commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20800
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20800
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90615/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20800
  
**[Test build #90615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90615/testReport)**
 for PR 20800 at commit 
[`f30d3ec`](https://github.com/apache/spark/commit/f30d3ec95c0d00f409f6536d10710b2f65fad787).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-05-14 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/21260
  
@er0sin PVCs can be mounted similarly to the example below:

```
spark.kubernetes.driver.volumes.persistentVolumeClaim.pv1.mount.path=/mnt

spark.kubernetes.driver.volumes.persistentVolumeClaim.pv1.mount.readOnly=false

spark.kubernetes.driver.volumes.persistentVolumeClaim.pv1.options.claimName=clm-1

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90613/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #90613 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90613/testReport)**
 for PR 21221 at commit 
[`10ed328`](https://github.com/apache/spark/commit/10ed328bfcf160711e7619aac23472f97bf1c976).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21114: [SPARK-22371][CORE] Return None instead of throwi...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21114#discussion_r188164036
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -237,6 +236,65 @@ class AccumulatorSuite extends SparkFunSuite with 
Matchers with LocalSparkContex
 acc.merge("kindness")
 assert(acc.value === "kindness")
   }
+
+  test("updating garbage collected accumulators") {
--- End diff --

tests also have a maintenance cost. Sometimes we change something and break 
the test, we need to look into the test and see if the test is wrong or the 
change is wrong. Or sometimes the test becomes flaky and we need to investigate.

This test seems to prove a thing can happen while it's already proved by 
other tests, and I think this test is not worth the maintenance cost given its 
complexity.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.writeFul...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21327
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3219/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.writeFul...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21327
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.writeFul...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21327
  
**[Test build #90622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90622/testReport)**
 for PR 21327 at commit 
[`cd2f0e3`](https://github.com/apache/spark/commit/cd2f0e3658964818b076e6de150f15db32f3c455).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.w...

2018-05-14 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21327

[SPARK-24107][CORE][followup] ChunkedByteBuffer.writeFully method has not 
reset the limit value

## What changes were proposed in this pull request?

According to the discussion in https://github.com/apache/spark/pull/21175 , 
this PR proposes 2 improvements:
1. add comments to explain why we call `limit` to write out `ByteBuffer` 
with slices.
2. remove the `try ... finally`

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21327


commit cd2f0e3658964818b076e6de150f15db32f3c455
Author: Wenchen Fan 
Date:   2018-05-15T04:29:56Z

improve




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.writeFul...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21327
  
cc @manbuyun @JoshRosen @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r188161280
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
-val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
-bytes.limit(bytes.position() + ioSize)
-channel.write(bytes)
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
+try {
+  val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
+  bytes.limit(bytes.position() + ioSize)
+  channel.write(bytes)
+} finally {
--- End diff --

I get your point. if there is an exception, there is no next loop and we 
don't need to restore the limit. so try finally is not needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r188160813
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
-val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
-bytes.limit(bytes.position() + ioSize)
-channel.write(bytes)
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
+try {
+  val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
+  bytes.limit(bytes.position() + ioSize)
+  channel.write(bytes)
+} finally {
--- End diff --

I think the problem is, `bytes.limit(bytes.position() + ioSize)` will 
change the result of `bytes.hasRemaining`, so we have to restore the limit in 
each loop.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully method ...

2018-05-14 Thread JoshRosen
Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/21175
  
No, I mean that the code here can simply follow the write call as straight
through code. We don't need to guard against exceptions here because the
duplicate of the buffer is used only by a single thread, so you can omit
the try block and just concatenate the try contents to the finally
contents. Minor bit but I wanted to comment because I initially was
confused about when errors could occur and thread safety / sharing until I
realized that the modified state does not escape this method.
On Mon, May 14, 2018 at 9:03 PM Wenchen Fan 
wrote:

> *@cloud-fan* commented on this pull request.
> --
>
> In core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
> :
>
> > @@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
> */
>def writeFully(channel: WritableByteChannel): Unit = {
>  for (bytes <- getChunks()) {
> -  while (bytes.remaining() > 0) {
> -val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
> -bytes.limit(bytes.position() + ioSize)
> -channel.write(bytes)
> +  val curChunkLimit = bytes.limit()
> +  while (bytes.hasRemaining) {
> +try {
> +  val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
> +  bytes.limit(bytes.position() + ioSize)
> +  channel.write(bytes)
> +} finally {
>
> Do you mean this is not a real bug that can cause real workload to fail?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r188160044
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
-val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
-bytes.limit(bytes.position() + ioSize)
-channel.write(bytes)
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
+try {
+  val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
+  bytes.limit(bytes.position() + ioSize)
+  channel.write(bytes)
+} finally {
--- End diff --

Do you mean this is not a real bug that can cause real workload to fail?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21312
  
Not sure why, but previously calling `ListVector.clear`, I must change the 
reset order to reset element writer first to pass the test:
```scala
 override def reset(): Unit = {
+elementWriter.reset()
 super.reset()
-elementWriter.reset()
   }
```

Now with manual reset, this order doesn't affect test result anymore. I 
respect original order and restore it back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21312
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21312
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3218/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21312
  
**[Test build #90621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90621/testReport)**
 for PR 21312 at commit 
[`400f605`](https://github.com/apache/spark/commit/400f605141246f8ded825fcd4955bdc9043e71a0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21239
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90609/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21239
  
**[Test build #90609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90609/testReport)**
 for PR 21239 at commit 
[`41577c3`](https://github.com/apache/spark/commit/41577c35a7c59ffcf48225fbc30b0dc3c8cab674).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r188154900
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
-val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
-bytes.limit(bytes.position() + ioSize)
-channel.write(bytes)
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
+try {
+  val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
+  bytes.limit(bytes.position() + ioSize)
--- End diff --

The rationale for the `limit()` isn't super-clear, but that was a problem 
in the original PR which introduced the bug (#18730). I'm commenting here only 
for cross-reference reference for folks who come across this patch in the 
future. I believe that the original motivation was 
http://www.evanjones.ca/java-bytebuffer-leak.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R][backport-2.2] backport lint fix

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21325
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90610/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R][backport-2.2] backport lint fix

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21325
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R][backport-2.2] backport lint fix

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21325
  
**[Test build #90617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90617/testReport)**
 for PR 21325 at commit 
[`367d122`](https://github.com/apache/spark/commit/367d1228980ef3401fc1546d6a737d495590360a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21326: [SPARK-24275][SQL] Revise doc comments in InputPartition

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21326
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21326: [SPARK-24275][SQL] Revise doc comments in InputPartition

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21326
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3217/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90610 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90610/testReport)**
 for PR 21291 at commit 
[`f93738b`](https://github.com/apache/spark/commit/f93738be3a7509d70568b3060a0cc4dd3ff23da0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r188154716
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
-val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
-bytes.limit(bytes.position() + ioSize)
-channel.write(bytes)
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
+try {
+  val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
+  bytes.limit(bytes.position() + ioSize)
+  channel.write(bytes)
+} finally {
--- End diff --

I don't think we need the `try` and `finally` here because `getChunks()` 
returns duplicated ByteBuffers which have their own position and limit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21326: [SPARK-24275][SQL] Revise doc comments in InputPartition

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21326
  
**[Test build #90620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90620/testReport)**
 for PR 21326 at commit 
[`76a116a`](https://github.com/apache/spark/commit/76a116a93bd76b1f9fddec4f14515cf3a116a54f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21326: [SPARK-24275][SQL] Revise doc comments in InputPartition

2018-05-14 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/21326
  
@rdblue @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21326: [SPARK-24275][SQL] Revise doc comments in InputPa...

2018-05-14 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/21326

[SPARK-24275][SQL] Revise doc comments in InputPartition

## What changes were proposed in this pull request?

In #21145,  DataReaderFactory is renamed to InputPartition. 

This PR is to revise wording in the comments to make it more clear.

## How was this patch tested?

None

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark revise_reader_comments

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21326


commit 76a116a93bd76b1f9fddec4f14515cf3a116a54f
Author: Gengliang Wang 
Date:   2018-05-15T02:47:03Z

revise the comments in read path




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21322
  
One concern I have is, now we keep broadcasted variables in 
`BroadcastManager.cachedValues` by using weak reference. So if a broadcasted 
variable with `AutoCloseable` is released before we call `Broadcast#destroy`, 
we still can't properly release the resources.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21312
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3216/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21312
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a p...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21257
  
**[Test build #90619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90619/testReport)**
 for PR 21257 at commit 
[`6821795`](https://github.com/apache/spark/commit/68217952e25c2eef0064c433fe78e1a2240cb659).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21312
  
**[Test build #90618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90618/testReport)**
 for PR 21312 at commit 
[`a2d1444`](https://github.com/apache/spark/commit/a2d1444e11db5c3d6d04724e972b2470d2cd616c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21114: [SPARK-22371][CORE] Return None instead of throwi...

2018-05-14 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21114#discussion_r188152980
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -237,6 +236,65 @@ class AccumulatorSuite extends SparkFunSuite with 
Matchers with LocalSparkContex
 acc.merge("kindness")
 assert(acc.value === "kindness")
   }
+
+  test("updating garbage collected accumulators") {
--- End diff --

This test can reproduce the crush scenario in original code base and 
successful ended after this patch. I think @cloud-fan is worrying about this 
test shouldn't commit in code base because it complexity?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-05-14 Thread er0sin
Github user er0sin commented on the issue:

https://github.com/apache/spark/pull/21260
  
How does one configure a PV/PVC with this change?
`spark.kubernetes.executor.volumes=pvName:containerPath` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...

2018-05-14 Thread artemrd
Github user artemrd commented on the issue:

https://github.com/apache/spark/pull/21114
  
There's "get accum" test which does this, it was updated for new behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R][backport-2.2] backport lint fix

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21325
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3215/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R][backport-2.2] backport lint fix

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21325
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R][backport-2.2] backport lint fix

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21325
  
**[Test build #90617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90617/testReport)**
 for PR 21325 at commit 
[`367d122`](https://github.com/apache/spark/commit/367d1228980ef3401fc1546d6a737d495590360a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21153
  
OK. Looks like AppVeyor build is ok now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21325: [R] backport lint fix

2018-05-14 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21325
  
also need this in branch-2.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21325: [R] backport lint fix

2018-05-14 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/21325

[R] backport lint fix

## What changes were proposed in this pull request?

backport part of the commit that addresses lintr issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rlintfix22

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21325


commit 367d1228980ef3401fc1546d6a737d495590360a
Author: Felix Cheung 
Date:   2018-05-15T02:24:43Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21315: [SPARK-23780][R] Failed to use googleVis library ...

2018-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21315


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20872: [SPARK-23264][SQL] Fix scala.MatchError in literals.sql....

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20872
  
actually this PR linked to a wrong JIRA, @maropu can you create a new jira 
and fix it? thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90616/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21267
  
**[Test build #90616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)**
 for PR 21267 at commit 
[`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21315: [SPARK-23780][R] Failed to use googleVis library with ne...

2018-05-14 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21315
  
merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90614/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21267
  
**[Test build #90614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)**
 for PR 21267 at commit 
[`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-05-14 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
ping @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21165
  
I think we can just update MimaExcludes, since it's developer API. cc 
@JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21114
  
can we do this?
```
var acc = ...
... // launch a long running job
val accId = acc.getId
acc = null
gc
... // job finished
```

accumulator is created by users so we have to be prepared for any 
situations. That's why we use weak reference at the first place. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21322
  
I think users are responsible to call `Broadcast#destroy`, which unpersist 
broadcast blocks from block manager and run user-defined driver side cleanup.

It is a valid use case to allow users to define some executor side cleanup 
via `AutoCloseable`. However, I don't think we should always detect 
`AutoCloseable` when removing a block, as it may break existing program and 
cause perf regression. We should only do it for broadcast blocks.

A good place to do it seems to be `BlockManager.removeBroadcast`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2018-05-14 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19840
  
@vanzin I am not very familiar with python part 
[context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191),
 so handle it at `api/python/PythonRunner` as I did in this pr. 

Maybe someone else could help, sorry for the delay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3214/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21267
  
**[Test build #90616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)**
 for PR 21267 at commit 
[`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21267
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3213/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21267
  
**[Test build #90614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)**
 for PR 21267 at commit 
[`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20800
  
**[Test build #90615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90615/testReport)**
 for PR 20800 at commit 
[`f30d3ec`](https://github.com/apache/spark/commit/f30d3ec95c0d00f409f6536d10710b2f65fad787).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r188144573
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+self._python_includes.append(filename)
+sys.path.insert(1, filepath)
+except Exception:
+from pyspark import util
+warnings.warn(
--- End diff --

Likewise, I checked the warning manually:

```
.../pyspark/context.py:229: RuntimeWarning: Failed to add file 
[/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path:

...
  /usr/lib64/python27.zip
  /usr/lib64/python2.7
... 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20800
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20800
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread sohama4
Github user sohama4 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21316#discussion_r188143976
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
   @Experimental
   @InterfaceStability.Evolving
-  def reduce(func: (T, T) => T): T = rdd.reduce(func)
+  def reduce(func: (T, T) => T): T = withNewExecutionId {
--- End diff --

Thanks, that makes sense when I looked at the code for `foreach` and 
`foreachPartition`; I put up a new version with this change. It however wasn't 
clear immediately how the new function `withNewRDDExecutionId` would be 
beneficial over `withNewExecutionId`, can you elaborate a little when you get 
the chance?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r188143390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a 
non-null element present also in a2. If the arrays have no common element and 
they are both non-empty and either of them contains a null element null is 
returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+fastEval _
+  } else {
+bruteForceEval _
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller) = if (arr1.numElements() > arr2.numElements()) {
+  (arr1, arr2)
+} else {
+  (arr2, arr1)
+}
+if (smaller.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smaller.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  bigger.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  /**
+   * A slower evaluation which performs a nested loop and supports all the 
data types.
+   */
+  private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+if (arr1.numElements() > 0) {
+  arr1.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else {
+  arr2.foreach(elementType, (_, v2) =>
+if (v1 == null) {
+  hasNull = true
+} else if (ordering.equiv(v1, v2)) {
+  return true
+}
+  )
+})
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (a1, a2) => {
+  val smaller = ctx.freshName("smallerArray")
+  val bigger = ctx.freshName("biggerArray")
+  val comparisonCode = if (elementTypeSupportEquals) {
+fastCodegen(ctx, ev, smaller, bigger)
+  } else {
+bruteForceCodegen(ctx, ev, smaller, bigger)
+  }
+  s"""
+ |ArrayData $smaller;
+ |ArrayData $bigger;
+ |if ($a1.numElements() > $a2.numElements()) {
+ 

[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21183
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90611/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21183
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21183
  
**[Test build #90611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90611/testReport)**
 for PR 21183 at commit 
[`7ee0ebf`](https://github.com/apache/spark/commit/7ee0ebf028e41719514c0588d378cb515aea744a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90608/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21323
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21323
  
**[Test build #90608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90608/testReport)**
 for PR 21323 at commit 
[`56437da`](https://github.com/apache/spark/commit/56437da708fc12d2c9216a1365a8afd6f81af845).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #90613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90613/testReport)**
 for PR 21221 at commit 
[`10ed328`](https://github.com/apache/spark/commit/10ed328bfcf160711e7619aac23472f97bf1c976).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90612/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #90612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90612/testReport)**
 for PR 21153 at commit 
[`dc59375`](https://github.com/apache/spark/commit/dc593754c62d2daf89331ea21d9250af9b9febfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-14 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r188136532
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1753,9 +1766,21 @@ class DAGScheduler(
 messageScheduler.shutdownNow()
 eventProcessLoop.stop()
 taskScheduler.stop()
+heartbeater.stop()
+  }
+
+  /** Reports heartbeat metrics for the driver. */
+  private def reportHeartBeat(): Unit = {
--- End diff --

It's a bit redundant for fields that aren't used by the driver -- for the 
driver, execution memory gets set to 0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-05-14 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r188136553
  
--- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.TimeUnit
+
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Creates a heartbeat thread which will call the specified 
reportHeartbeat function at
+ * intervals of intervalMs.
+ *
+ * @param reportHeartbeat the heartbeat reporting function to call.
+ * @param intervalMs the interval between heartbeats.
+ */
+private[spark] class Heartbeater(reportHeartbeat: () => Unit, intervalMs: 
Long) {
+  // Executor for the heartbeat task
+  private val heartbeater = 
ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-heartbeater")
--- End diff --

Changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21312
  
It looks like the `ListVector` also needs `setLastSet` to be called with 0, 
which is only in `ListVector`.  This is fine though, since `ListVector` is the 
only vector extending `BaseRepeatedValueVector`
```
case listVector: ListVector =>
val buffers = listVector.getBuffers(false)
buffers.foreach(buf => buf.setByte(0, buf.capacity()))
listVector.setValueCount(0)
listVector.setLastSet(0)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3212/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21312
  
Ok. I will use manual reset for now and leave a TODO comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21312
  
I'm okay with either way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3211/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >