[GitHub] spark pull request #21570: [SPARK-24564][TEST] Add test suite for RecordBina...

2018-06-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21570#discussion_r198710399
  
--- Diff: 
core/src/test/java/org/apache/spark/memory/TestMemoryConsumer.java ---
@@ -43,6 +47,12 @@ void free(long size) {
 used -= size;
 taskMemoryManager.releaseExecutionMemory(size, this);
   }
+
+  @VisibleForTesting
--- End diff --

it's already in the test package, we don't need this tag.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21597
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92405/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21597
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21597
  
**[Test build #92405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92405/testReport)**
 for PR 21597 at commit 
[`9a65366`](https://github.com/apache/spark/commit/9a65366a0c9d9e7e57ecdaa0d437af01cbc0d006).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21546
  
Hey @BryanCutler, btw, mind i fI ask move the benchmarks into the PR 
description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21474: [SPARK-24297][CORE] Fetch-to-disk by default for ...

2018-06-27 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21474#discussion_r198709276
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -429,7 +429,11 @@ package object config {
 "external shuffle service, this feature can only be worked when 
external shuffle" +
 "service is newer than Spark 2.2.")
   .bytesConf(ByteUnit.BYTE)
-  .createWithDefault(Long.MaxValue)
+  // fetch-to-mem is guaranteed to fail if the message is bigger than 
2 GB, so we might
+  // as well use fetch-to-disk in that case.  The message includes 
some metadata in addition
+  // to the block data itself (in particular UploadBlock has a lot of 
metadata), so we leave
+  // extra room.
+  .createWithDefault(Int.MaxValue - 500)
--- End diff --

Actually I prefer 512 to 500  :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21589
  
Are you maybe able to manually test this in other cluster like standalone 
or yarn too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198708461
  
--- Diff: python/pyspark/context.py ---
@@ -406,6 +406,22 @@ def defaultMinPartitions(self):
 """
 return self._jsc.sc().defaultMinPartitions()
 
+@property
+def numCores(self):
+"""
+Total number of CPU cores of all executors registered in the 
cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
+"""
--- End diff --

Let's add a version information here too. It should have added versions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...

2018-06-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21542
  
from the error log it seems we need to include the test tag module in the 
pom.xml somewhere.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21533
  
**[Test build #92410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92410/testReport)**
 for PR 21533 at commit 
[`eb46ccf`](https://github.com/apache/spark/commit/eb46ccfec084c2439a26eee38015381f091fe164).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r198705671
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
   def addFile(path: String, recursive: Boolean): Unit = {
 val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null => new File(path).getCanonicalFile.toURI.toString
+  case "local" =>
+logWarning("We do not support add a local file here because file 
with local scheme is " +
+  "already existed on every node, there is no need to call addFile 
to add it again. " +
+  "(See more discussion about this in SPARK-24195.)")
--- End diff --

Got it, rephrase done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21653
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21653
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21653
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-06-27 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
cc @tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21653: [SPARK-13343] speculative tasks that didn't commi...

2018-06-27 Thread hthuynh2
GitHub user hthuynh2 opened a pull request:

https://github.com/apache/spark/pull/21653

[SPARK-13343] speculative tasks that didn't commit shouldn't be marked as 
success

**Description**
Currently Speculative tasks that didn't commit can show up as success of 
failures (depending on timing of commit). This is a bit confusing because that 
task didn't really succeed in the sense it didn't write anything.
I think these tasks should be marked as KILLED or something that is more 
obvious to the user exactly what happened. it is happened to hit the timing 
where it got a commit denied exception then it shows up as failed and counts 
against your task failures. It shouldn't count against task failures since that 
failure really doesn't matter.
MapReduce handles these situation so perhaps we can look there for a model.

https://user-images.githubusercontent.com/15680678/42013170-99db48c2-7a61-11e8-8c7b-ef94c84e36ea.png;>

**How can this issue happen?**
When both attempts of a task finish before the driver sends command to kill 
one of them, both of them send the status update FINISHED to the driver. The 
driver calls TaskSchedulerImpl to handle one successful task at a time. When it 
handles the first successful task, it sends the command to kill the other copy 
of the task, however, because that task is already finished, the executor will 
ignore the command. After finishing handling the first attempt, it processes 
the second one, although all actions on the result of this task are skipped, 
this copy of the task is still marked as SUCCESS. As a result, even though this 
issue does not affect the result of the job, it might cause confusing to user 
because both of them appear to be successful.

**How does this PR fix the issue?**
The simple way to fix this issue is that when taskSetManager handles 
successful task, it checks if any other attempt succeeded. If this is the case, 
it will call handleFailedTask with state==KILLED and 
reason==TaskKilled(“another attempt succeeded”) to handle this task as 
begin killed.

**How was this patch tested?**
I tested this manually by running applications, that caused the issue 
before, a few times, and observed that the issue does not happen again. Also, I 
added a unit test in TaskSetManagerSuite to test that if we call 
handleSuccessfulTask to handle status update for 2 copies of a task, only the 
one that is handled first will be mark as SUCCESS


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hthuynh2/spark SPARK_13343

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21653


commit 8f7d98177816e11659cf79a2b28f96bd4b7173d5
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-06-28T04:19:14Z

Fixed issue and added unit test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/533/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/533/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92409/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
**[Test build #92409 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92409/testReport)**
 for PR 21652 at commit 
[`7602dbc`](https://github.com/apache/spark/commit/7602dbc40779bc1972f5387eb2524e093b2c7a5e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/533/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92408/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
**[Test build #92408 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92408/testReport)**
 for PR 21652 at commit 
[`f0d59cc`](https://github.com/apache/spark/commit/f0d59cc2f6cd966e28e9dfe37922ecba69445c83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/532/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/532/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21652
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/532/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
**[Test build #92409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92409/testReport)**
 for PR 21652 at commit 
[`7602dbc`](https://github.com/apache/spark/commit/7602dbc40779bc1972f5387eb2524e093b2c7a5e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8s] Add integration tests for secrets

2018-06-27 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21652
  
@foxish @liyinan926 pls review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8s] Add integration tests for secrets

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21652
  
**[Test build #92408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92408/testReport)**
 for PR 21652 at commit 
[`f0d59cc`](https://github.com/apache/spark/commit/f0d59cc2f6cd966e28e9dfe37922ecba69445c83).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21652: [SPARK-24551][K8s] Add integration tests for secr...

2018-06-27 Thread skonto
GitHub user skonto opened a pull request:

https://github.com/apache/spark/pull/21652

[SPARK-24551][K8s] Add integration tests for secrets

## What changes were proposed in this pull request?

- Adds integration tests for env and mount secrets.

## How was this patch tested?

Manually by checking that secrets were added to the containers and by 
tuning the tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/skonto/spark add-secret-its

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21652


commit 9814eefe7f5a02e24b4750d8bf522e0e711db28f
Author: Stavros Kontopoulos 
Date:   2018-06-28T03:49:32Z

add secret tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21650
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92400/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21650
  
**[Test build #92400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92400/testReport)**
 for PR 21650 at commit 
[`6b47b69`](https://github.com/apache/spark/commit/6b47b69305257e9ee9f5135968913a4f92731ef5).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21589
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21589
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92402/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21589
  
**[Test build #92402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92402/testReport)**
 for PR 21589 at commit 
[`1405daf`](https://github.com/apache/spark/commit/1405daf18f9ae907f36c64e426bf65a3a9e567e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20739: [SPARK-23603][SQL]When the length of the json is ...

2018-06-27 Thread cxzl25
Github user cxzl25 closed the pull request at:

https://github.com/apache/spark/pull/20739


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20738: [SPARK-23603][SQL]When the length of the json is ...

2018-06-27 Thread cxzl25
Github user cxzl25 closed the pull request at:

https://github.com/apache/spark/pull/20738


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21451
  
**[Test build #92407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92407/testReport)**
 for PR 21451 at commit 
[`fa1928a`](https://github.com/apache/spark/commit/fa1928aa48655ca2fb036759260cfa71324ed37c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/531/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21167: [SPARK-24100][PYSPARK]Add the CompressionCodec to...

2018-06-27 Thread WzRaCai
Github user WzRaCai closed the pull request at:

https://github.com/apache/spark/pull/21167


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21440
  
**[Test build #92406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92406/testReport)**
 for PR 21440 at commit 
[`4b53667`](https://github.com/apache/spark/commit/4b5366794acc7ef792ecf1a06e9697db79268a67).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21440
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/530/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21440
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21650
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92401/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21650
  
**[Test build #92401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92401/testReport)**
 for PR 21650 at commit 
[`be3b99c`](https://github.com/apache/spark/commit/be3b99c951c3df77eace0a6a124f8f9a94ac804c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21611
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92403/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21611
  
**[Test build #92403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92403/testReport)**
 for PR 21611 at commit 
[`f04efa4`](https://github.com/apache/spark/commit/f04efa484e7b5dfbe709f65845bea58e53611604).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21440
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21440
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21440
  
**[Test build #92399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92399/testReport)**
 for PR 21440 at commit 
[`6c57e4d`](https://github.com/apache/spark/commit/6c57e4d35d76d5f2b618a24bd56d83899eea567e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21597: [SPARK-24603][SQL] Fix findTightestCommonType ref...

2018-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21597


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21597
  
Merged to master, branch-2.3 and branch-2.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92404/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21557
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21557
  
**[Test build #92404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)**
 for PR 21557 at commit 
[`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21596#discussion_r198686913
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
 ---
@@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils}
 
 /**
  * The benchmarks aims to measure performance of JSON parsing when 
encoding is set and isn't.
- * To run this:
- *  spark-submit --class  --jars 
+ * To run:
+ *  mvn clean package -pl sql/core -DskipTests
+ *  ./dev/make-distribution.sh --name local-dist
+ *  cd dist/
+ *  ./bin/spark-submit --class 
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \
+ *  ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > 
/tmp/output.txt
--- End diff --

Let's take out other comments like `make-distribution.sh` and `cd dist/` 
too since they can be varied by how to build.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21451
  
**[Test build #92398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92398/testReport)**
 for PR 21451 at commit 
[`1cc0f3f`](https://github.com/apache/spark/commit/1cc0f3ffa2b563c54771a38c4dd9f2598b29f0db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class UploadBlockStream extends BlockTransferMessage `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21597
  
**[Test build #92405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92405/testReport)**
 for PR 21597 at commit 
[`9a65366`](https://github.com/apache/spark/commit/9a65366a0c9d9e7e57ecdaa0d437af01cbc0d006).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21597
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...

2018-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21625


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPru...

2018-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21631


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBench...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21625
  
LGTM too

Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21631
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21631
  
LGTM.

@MaxGekk please take a following action. Will help and check if it's needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread tedyu
Github user tedyu commented on the issue:

https://github.com/apache/spark/pull/21651
  
cc @tdas 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-27 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21533
  
I think maybe we could:

1) either ignore the files with "local" scheme, and let user to decide how 
to fetch the files, like what current fix.
2) or copy the 'local' scheme files to the `SparkFiles#getRootDirectory` 
both in driver and executor. The change would be in `Utils#fetchFile`.

@jiangxb1987 @vanzin what's your option?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21557
  
**[Test build #92404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)**
 for PR 21557 at commit 
[`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21557
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198684121
  
--- Diff: project/MimaExcludes.scala ---
@@ -89,7 +89,13 @@ object MimaExcludes {
 
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol"),
 
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.getValidationIndicatorCol"),
 
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.org$apache$spark$ml$param$shared$HasValidationIndicatorCol$_setter_$validationIndicatorCol_="),
-
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol")
+
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol"),
+
+// [SPARK-23429][CORE] Add executor memory metrics to heartbeat and 
expose in executors REST API
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate.apply"),
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate.copy"),
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate.this"),
+
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate$")
--- End diff --

Will move up.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

2018-06-27 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21557#discussion_r198684081
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", 
predictionCol="prediction", maxIter=2
 @keyword_only
 @since("2.0.0")
 def setParams(self, featuresCol="features", 
predictionCol="prediction", maxIter=20,
-  seed=None, k=4, minDivisibleClusterSize=1.0):
+  seed=None, k=4, minDivisibleClusterSize=1.0, 
distanceMeasure="euclidean"):
 """
 setParams(self, featuresCol="features", 
predictionCol="prediction", maxIter=20, \
-  seed=None, k=4, minDivisibleClusterSize=1.0)
+  seed=None, k=4, minDivisibleClusterSize=1.0, 
distanceMeasure="euclidean")
 Sets params for BisectingKMeans.
--- End diff --

@BryanCutler Thank you very much for your review. I will make change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198683846
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala 
---
@@ -251,6 +261,217 @@ class EventLoggingListenerSuite extends SparkFunSuite 
with LocalSparkContext wit
 }
   }
 
+  /**
+   * Test executor metrics update logging functionality. This checks that a
+   * SparkListenerExecutorMetricsUpdate event is added to the Spark history
+   * log if one of the executor metrics is larger than any previously
+   * recorded value for the metric, per executor per stage. The task 
metrics
--- End diff --

Woops, that was left over from when it was ExecutorMetricsUpdated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92395/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198683408
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -98,14 +102,48 @@ class ExecutorSummary private[spark](
 val removeReason: Option[String],
 val executorLogs: Map[String, String],
 val memoryMetrics: Option[MemoryMetrics],
-val blacklistedInStages: Set[Int])
+val blacklistedInStages: Set[Int],
+@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer])
+@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer])
+val peakMemoryMetrics: Option[Array[Long]])
 
 class MemoryMetrics private[spark](
 val usedOnHeapStorageMemory: Long,
 val usedOffHeapStorageMemory: Long,
 val totalOnHeapStorageMemory: Long,
 val totalOffHeapStorageMemory: Long)
 
+/** deserialzer for peakMemoryMetrics: convert to array ordered by metric 
name */
+class PeakMemoryMetricsDeserializer private[spark] extends 
JsonDeserializer[Option[Array[Long]]] {
--- End diff --

This is odd, but I can't seem to comment on your earlier comment. Regarding 
having a serializer/deserializer, I also don't have strong feelings -- it makes 
it more readable, but also takes up more space in the history log.

Regarding this comment, thanks, I hadn't realized the placement meant that 
it marked the constructor. It's meant for the class, and I'll move.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #92395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92395/testReport)**
 for PR 21546 at commit 
[`fe3319b`](https://github.com/apache/spark/commit/fe3319bd7ab290e30f6075a81acd0b17818ad546).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BatchOrderSerializer(Serializer):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/21651
  
I am uncertain about some of the ways we should transfer the data stored in 
OffsetSeqs to external storage (e.g. like KafkaSink which I mentioned before).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198682917
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -264,6 +282,11 @@ private[spark] trait SparkListenerInterface {
*/
   def onExecutorMetricsUpdate(executorMetricsUpdate: 
SparkListenerExecutorMetricsUpdate): Unit
 
+  /**
+   * Called when the driver reads stage executor metrics from the history 
log.
--- End diff --

Updated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198682980
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -669,6 +686,29 @@ private[spark] class AppStatusListener(
 }
   }
 }
+event.executorUpdates.foreach { updates: Array[Long] =>
+  // check if there is a new peak value for any of the executor level 
memory metrics
+  liveExecutors.get(event.execId).foreach { exec: LiveExecutor =>
+if (exec.peakExecutorMetrics.compareAndUpdate(updates)) {
+  maybeUpdate(exec, now)
+}
+  }
+}
+  }
+
+  override def onStageExecutorMetrics(executorMetrics: 
SparkListenerStageExecutorMetrics): Unit = {
+val now = System.nanoTime()
+
+// check if there is a new peak value for any of the executor level 
memory metrics
--- End diff --

Unfortunately, yes. I've added some comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198682809
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricGetter.scala 
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics
+
+import java.lang.management.{BufferPoolMXBean, ManagementFactory}
+import javax.management.ObjectName
+
+import org.apache.spark.memory.MemoryManager
+
+sealed trait MetricGetter {
--- End diff --

Added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-27 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r198682844
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
   def addFile(path: String, recursive: Boolean): Unit = {
 val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null => new File(path).getCanonicalFile.toURI.toString
+  case "local" =>
+logWarning("We do not support add a local file here because file 
with local scheme is " +
+  "already existed on every node, there is no need to call addFile 
to add it again. " +
+  "(See more discussion about this in SPARK-24195.)")
--- End diff --

Can we please rephrase to "File with 'local' scheme is not supported to add 
to file server, since it is already available on every node."?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/21651
  
cc @koeninger 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198682884
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +181,28 @@ private[spark] class EventLoggingListener(
 
   // Events that trigger a flush
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
+if (shouldLogExecutorMetricsUpdates) {
+  // clear out any previous attempts, that did not have a stage 
completed event
+  val prevAttemptId = event.stageInfo.attemptNumber() - 1
+  for (attemptId <- 0 to prevAttemptId) {
+liveStageExecutorMetrics.remove((event.stageInfo.stageId, 
attemptId))
+  }
+
+  // log the peak executor metrics for the stage, for each live 
executor,
+  // whether or not the executor is running tasks for the stage
+  val executorMap = liveStageExecutorMetrics.remove(
+(event.stageInfo.stageId, event.stageInfo.attemptNumber()))
+  executorMap.foreach {
+   executorEntry => {
+  for ((executorId, peakExecutorMetrics) <- executorEntry) {
--- End diff --

Yes, the naming is confusing. Changed to the 1st option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198682779
  
--- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.concurrent.TimeUnit
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.memory.MemoryManager
+import org.apache.spark.metrics.MetricGetter
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Creates a heartbeat thread which will call the specified 
reportHeartbeat function at
+ * intervals of intervalMs.
+ *
+ * @param memoryManager the memory manager for execution and storage 
memory.
+ * @param reportHeartbeat the heartbeat reporting function to call.
+ * @param name the thread name for the heartbeater.
+ * @param intervalMs the interval between heartbeats.
+ */
+private[spark] class Heartbeater(
+memoryManager: MemoryManager,
+reportHeartbeat: () => Unit,
+name: String,
+intervalMs: Long) extends Logging {
+  // Executor for the heartbeat task
+  private val heartbeater = 
ThreadUtils.newDaemonSingleThreadScheduledExecutor(name)
+
+  /** Schedules a task to report a heartbeat. */
+  private[spark] def start(): Unit = {
--- End diff --

Removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-27 Thread edwinalu
Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r198682787
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1922,6 +1928,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
 Utils.tryLogNonFatalError {
   _eventLogger.foreach(_.stop())
 }
+if(_heartbeater != null) {
--- End diff --

Added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-06-27 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21631
  
@HyukjinKwon BTW, can you check this?
@MaxGekk Probably, I feel you'd be better to file a new jira for the point 
you're looking into.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21651
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21651
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >