[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91810/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21441
  
**[Test build #91810 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91810/testReport)**
 for PR 21441 at commit 
[`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91809/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21441
  
**[Test build #91809 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91809/testReport)**
 for PR 21441 at commit 
[`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21503
  
cc @rxin if you are interested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...

2018-06-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21537#discussion_r195307036
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -579,6 +579,22 @@ class CodegenContext {
 s"${fullName}_$id"
   }
 
+  /**
+   * Creates an `ExprValue` representing a local java variable of required 
data type.
+   */
+  def freshName(name: String, dt: DataType): VariableValue = 
JavaCode.variable(freshName(name), dt)
+
+  /**
+   * Creates an `ExprValue` representing a local java variable of required 
data type.
+   */
+  def freshName(name: String, javaClass: Class[_]): VariableValue =
+JavaCode.variable(freshName(name), javaClass)
+
+  /**
+   * Creates an `ExprValue` representing a local boolean java variable.
+   */
+  def isNullFreshName(name: String): VariableValue = 
JavaCode.isNullVariable(freshName(name))
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21537#discussion_r195306844
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -579,6 +579,22 @@ class CodegenContext {
 s"${fullName}_$id"
   }
 
+  /**
+   * Creates an `ExprValue` representing a local java variable of required 
data type.
+   */
+  def freshName(name: String, dt: DataType): VariableValue = 
JavaCode.variable(freshName(name), dt)
+
+  /**
+   * Creates an `ExprValue` representing a local java variable of required 
data type.
+   */
+  def freshName(name: String, javaClass: Class[_]): VariableValue =
+JavaCode.variable(freshName(name), javaClass)
+
+  /**
+   * Creates an `ExprValue` representing a local boolean java variable.
+   */
+  def isNullFreshName(name: String): VariableValue = 
JavaCode.isNullVariable(freshName(name))
--- End diff --

`isNullFreshName` is new, we don't need it and can just call 
`freshName(name, BooleanType)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21537#discussion_r195306721
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -579,6 +579,22 @@ class CodegenContext {
 s"${fullName}_$id"
   }
 
+  /**
+   * Creates an `ExprValue` representing a local java variable of required 
data type.
+   */
+  def freshName(name: String, dt: DataType): VariableValue = 
JavaCode.variable(freshName(name), dt)
--- End diff --

oh I missed the ctx parameter thing, let's leave it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r195305634
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
 }
 
 /*
+OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.26-46.32.amzn1.x86_64
 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 Select 0 string row (value IS NULL): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-Parquet Vectorized8452 / 8504  1.9 
537.3   1.0X
-Parquet Vectorized (Pushdown)  274 /  281 57.3 
 17.4  30.8X
-Native ORC Vectorized 8167 / 8185  1.9 
519.3   1.0X
-Native ORC Vectorized (Pushdown)   365 /  379 43.1 
 23.2  23.1X
+Parquet Vectorized2961 / 3123  5.3 
188.3   1.0X
+Parquet Vectorized (Pushdown) 3057 / 3121  5.1 
194.4   1.0X
--- End diff --

Thank you for updating, @maropu .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21560
  
**[Test build #91817 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91817/testReport)**
 for PR 21560 at commit 
[`252f5c9`](https://github.com/apache/spark/commit/252f5c9d0e4a5b6d1a456e847a53cf4f0e84dcfb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21547
  
**[Test build #91818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91818/testReport)**
 for PR 21547 at commit 
[`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/123/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4012/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21547
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21547
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21547
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21547
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91808/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21547
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21560
  
**[Test build #91816 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91816/testReport)**
 for PR 21560 at commit 
[`03cc20d`](https://github.com/apache/spark/commit/03cc20d73dd547e476fad90d47225ef9e96a8cbc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21547
  
**[Test build #91808 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91808/testReport)**
 for PR 21547 at commit 
[`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-06-13 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r195304544
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
 }
 
 /*
+OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.26-46.32.amzn1.x86_64
 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 Select 0 string row (value IS NULL): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-Parquet Vectorized8452 / 8504  1.9 
537.3   1.0X
-Parquet Vectorized (Pushdown)  274 /  281 57.3 
 17.4  30.8X
-Native ORC Vectorized 8167 / 8185  1.9 
519.3   1.0X
-Native ORC Vectorized (Pushdown)   365 /  379 43.1 
 23.2  23.1X
+Parquet Vectorized2961 / 3123  5.3 
188.3   1.0X
+Parquet Vectorized (Pushdown) 3057 / 3121  5.1 
194.4   1.0X
--- End diff --

The result in v2.3.1: 
https://gist.github.com/maropu/88627246b7143ede5ab73c7183ab2128

That is not a regression, but I probably run the bench in wrong branch or 
commit.
I re-ran the bench in the current master and updated the pr.

how-to-run: I created a new `m4.2xlarge` instance, fetched this pr, rebased 
to master, and run the bench.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4011/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21560
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21560
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-13 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21560
  
@HeartSaVioR @arunmahadevan @xuanyuanking @tdas @zsxwing 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91815/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-13 Thread jose-torres
GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/21560

[SPARK-24386][SS] coalesce(1) aggregates in continuous processing

## What changes were proposed in this pull request?

Provide a continuous processing implementation of coalesce(1), as well as 
allowing aggregates on top of it.

The changes in ContinuousQueuedDataReader and such are to use split.index 
(the ID of the partition within the RDD currently being compute()d) rather than 
context.partitionId() (the partition ID of the scheduled task within the Spark 
job - that is, the post coalesce writer). In the absence of a narrow 
dependency, these values were previously always the same, so there was no need 
to distinguish.

## How was this patch tested?

new unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark coalesce

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21560


commit 1d6b71898e2a640e3c0809695d2b83f3f84eaa38
Author: Jose Torres 
Date:   2018-05-15T18:07:54Z

continuous shuffle read RDD

commit b5d100875932bdfcb645c8f6b2cdb7b815d84c80
Author: Jose Torres 
Date:   2018-05-17T03:11:11Z

docs

commit af407694a5f13c18568da4a63848f82374a44377
Author: Jose Torres 
Date:   2018-05-17T03:19:37Z

Merge remote-tracking branch 'apache/master' into readerRddMaster

commit 46456dc75a6aec9659b18523c421999debd060eb
Author: Jose Torres 
Date:   2018-05-17T03:22:49Z

fix ctor

commit 2ea8a6f94216e8b184e5780ec3e6ffb2838de382
Author: Jose Torres 
Date:   2018-05-17T03:43:10Z

multiple partition test

commit 955ac79eb05dc389e632d1aaa6c59396835c6ed5
Author: Jose Torres 
Date:   2018-05-17T13:33:51Z

unset task context after test

commit 8cefb724512b51f2aa1fdd81fa8a2d4560e60ce3
Author: Jose Torres 
Date:   2018-05-18T00:00:05Z

conf from RDD

commit f91bfe7e3fc174202d7d5c7cde5a8fb7ce86bfd3
Author: Jose Torres 
Date:   2018-05-18T00:00:44Z

endpoint name

commit 259029298fc42a65e8ebb4d2effe49b7fafa96f1
Author: Jose Torres 
Date:   2018-05-18T00:02:08Z

testing bool

commit 859e6e4dd4dd90ffd70fc9cbd243c94090d72506
Author: Jose Torres 
Date:   2018-05-18T00:22:10Z

tests

commit b23b7bb17abe3cbc873a3144c56d08c88bc0c963
Author: Jose Torres 
Date:   2018-05-18T00:40:55Z

take instead of poll

commit 97f7e8ff865e6054d0d70914ce9bb51880b161f6
Author: Jose Torres 
Date:   2018-05-18T00:58:44Z

add interface

commit de21b1c25a333d44c0521fe151b468e51f0bdc47
Author: Jose Torres 
Date:   2018-05-18T01:02:37Z

clarify comment

commit 7dcf51a13e92a0bb2998e2a12e67d351e1c1a4fc
Author: Jose Torres 
Date:   2018-05-18T22:39:28Z

multiple

commit ad0b5aab320413891f7c21ea6115b6da8d49ccf9
Author: Jose Torres 
Date:   2018-05-25T00:06:15Z

writer with 1 reader partition

commit c9adee5423c2e8a030911008d2e6942045d484bb
Author: Jose Torres 
Date:   2018-05-25T00:15:39Z

docs and iface

commit 63d38d849107eed226449cec8d24c2241cd583c9
Author: Jose Torres 
Date:   2018-05-25T00:27:26Z

Merge remote-tracking branch 'apache/master' into writerTask

commit 331f437423262a1aa76754a8079d7c017e4ea28a
Author: Jose Torres 
Date:   2018-05-25T00:37:14Z

increment epoch

commit f3ce67529372f72370a1e6028dc71a751acf26f2
Author: Jose Torres 
Date:   2018-05-25T00:40:39Z

undo oop

commit e0108d7bc164b9e5eeb757c13c80bc1d11671188
Author: Jose Torres 
Date:   2018-05-25T00:54:01Z

make rdd loop

commit 024f92d6bd471e207e1625dc6cdca31e1067deb8
Author: Jose Torres 
Date:   2018-05-25T22:56:59Z

basic

commit 8f1939b91dbef76879d5e5f2077dea35e5343e89
Author: Jose Torres 
Date:   2018-06-11T21:48:21Z

coalesce working

commit c99d9524d4778b973df34378e98d53a152e0a42c
Author: Jose Torres 
Date:   2018-06-13T21:34:38Z

Merge remote-tracking branch 'apache/master' into coalesce

commit aaac0af0ddebffe64338a69a5a16dcfab9432a51
Author: Jose Torres 
Date:   2018-06-13T22:04:19Z

fix merge

commit 80d60db4c99e52e624dcbd19cc7c5ba519ff4e1c
Author: Jose Torres 
Date:   2018-06-13T23:09:29Z

rm spurious diffs

commit 26b74f016033f582a61694133b82df6a40295c0b
Author: Jose Torres 
Date:   2018-06-14T04:43:00Z

unsupported check

commit 03cc20d73dd547e476fad90d47225ef9e96a8cbc
Author: Jose Torres 
Date:   2018-06-14T04:54:30Z

change back timeout




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #91815 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91815/testReport)**
 for PR 21288 at commit 
[`fa53156`](https://github.com/apache/spark/commit/fa53156599812adc94f089b8c163224fb2e4935f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #91815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91815/testReport)**
 for PR 21288 at commit 
[`fa53156`](https://github.com/apache/spark/commit/fa53156599812adc94f089b8c163224fb2e4935f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-06-13 Thread lucashu1
Github user lucashu1 commented on the issue:

https://github.com/apache/spark/pull/21092
  
Sorry in advance if this is the wrong place to be asking this! 

Does this PR mean that we'll be able to create SparkContexts using 
PySpark's 
[`SparkSession.Builder`](https://spark.apache.org/docs/preview/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder)
 with `master` set to `k8s://<...>:<...>`, and have the resulting jobs run on 
spark-on-k8s, instead of on local/standalone? 

E.g.:
```
from pyspark.sql import SparkSession
spark = 
SparkSession.builder.master('k8s://https://kubernetes:443').getOrCreate()
```

I'm trying to use PySpark in a Jupyter notebook that's running inside a 
Kubernetes pod, and have it use spark-on-k8s instead of resorting to using 
`local[*]` as `master`. 

Till now, I've been getting an error saying that:

> Error: Python applications are currently not supported for Kubernetes.

whenever I try to use `k8s://<...>` as `master`.

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/122/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/121/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4010/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21379
  
**[Test build #91814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91814/testReport)**
 for PR 21379 at commit 
[`a3be215`](https://github.com/apache/spark/commit/a3be215755f00100be0817b2a59f1ea8a185518b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/120/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21379
  
**[Test build #91813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91813/testReport)**
 for PR 21379 at commit 
[`d76bc7f`](https://github.com/apache/spark/commit/d76bc7fc555bbfe4da25c959646a6ee5961d4d14).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4009/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91806/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21389
  
**[Test build #91806 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91806/testReport)**
 for PR 21389 at commit 
[`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/114/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91805/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20929
  
**[Test build #91805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91805/testReport)**
 for PR 20929 at commit 
[`22e0d9f`](https://github.com/apache/spark/commit/22e0d9f12e4b08a4337c61371cf4ff795a2752b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21379
  
**[Test build #91812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91812/testReport)**
 for PR 21379 at commit 
[`a9b0306`](https://github.com/apache/spark/commit/a9b030682be358f36c0d2e64b175017458774b20).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4008/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/119/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20929
  
@mengxr ok, could you check?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91804/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #91804 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91804/testReport)**
 for PR 21221 at commit 
[`99044e6`](https://github.com/apache/spark/commit/99044e6ec0cdc1b760c57dd5b7e74349384c6a98).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #91811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91811/testReport)**
 for PR 21221 at commit 
[`99044e6`](https://github.com/apache/spark/commit/99044e6ec0cdc1b760c57dd5b7e74349384c6a98).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21221
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195297024
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -98,14 +101,53 @@ class ExecutorSummary private[spark](
 val removeReason: Option[String],
 val executorLogs: Map[String, String],
 val memoryMetrics: Option[MemoryMetrics],
-val blacklistedInStages: Set[Int])
+val blacklistedInStages: Set[Int],
+@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer])
+@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer])
+val peakMemoryMetrics: Option[Array[Long]])
 
 class MemoryMetrics private[spark](
 val usedOnHeapStorageMemory: Long,
 val usedOffHeapStorageMemory: Long,
 val totalOnHeapStorageMemory: Long,
 val totalOffHeapStorageMemory: Long)
 
+/** deserialzer for peakMemoryMetrics: convert to array ordered by metric 
name */
+class PeakMemoryMetricsDeserializer extends 
JsonDeserializer[Option[Array[Long]]] {
--- End diff --

can this be `private[spark]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91802/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #91802 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91802/testReport)**
 for PR 21221 at commit 
[`2662f6f`](https://github.com/apache/spark/commit/2662f6f9c6a7c34cea34b748f6735eb1625b73cb).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `class PeakMemoryMetricsDeserializer extends 
JsonDeserializer[Option[Array[Long]]] `
  * `class PeakMemoryMetricsSerializer extends 
JsonSerializer[Option[Array[Long]]] `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91803/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20929
  
**[Test build #91803 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91803/testReport)**
 for PR 20929 at commit 
[`58054ef`](https://github.com/apache/spark/commit/58054ef61f61a999117ec8617eed34e446ddb078).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21441
  
**[Test build #91810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91810/testReport)**
 for PR 21441 at commit 
[`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4007/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/118/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21441
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21441
  
**[Test build #91809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91809/testReport)**
 for PR 21441 at commit 
[`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21441
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/117/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195289142
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -93,6 +96,9 @@ private[spark] class EventLoggingListener(
   // Visible for tests only.
   private[scheduler] val logPath = getLogPath(logBaseDir, appId, 
appAttemptId, compressionCodecName)
 
+  // map of live stages, to peak executor metrics for the stage
+  private val liveStageExecutorMetrics = HashMap[(Int, Int), 
HashMap[String, PeakExecutorMetrics]]()
--- End diff --

map of (stageId, stageAttempt) for live stages, to peak executor metrics 
for the stage


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195289072
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1751,7 +1753,7 @@ class DAGScheduler(
 messageScheduler.shutdownNow()
 eventProcessLoop.stop()
 taskScheduler.stop()
-  }
+   }
--- End diff --

nit: old indentation was right


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195289751
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +182,31 @@ private[spark] class EventLoggingListener(
 
   // Events that trigger a flush
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
+if (shouldLogExecutorMetricsUpdates) {
+  // clear out any previous attempts, that did not have a stage 
completed event
+  val prevAttemptId = event.stageInfo.attemptNumber() - 1
+  for (attemptId <- 0 to prevAttemptId) {
+liveStageExecutorMetrics.remove((event.stageInfo.stageId, 
attemptId))
+  }
+
+  // log the peak executor metrics for the stage, for each live 
executor,
+  // whether or not the executor is running tasks for the stage
+  val accumUpdates = new ArrayBuffer[(Long, Int, Int, 
Seq[AccumulableInfo])]()
+  val executorMap = liveStageExecutorMetrics.remove(
+(event.stageInfo.stageId, event.stageInfo.attemptNumber()))
+  executorMap.foreach {
+   executorEntry => {
+  for ((executorId, peakExecutorMetrics) <- executorEntry) {
+val executorMetrics = new ExecutorMetrics(-1, 
peakExecutorMetrics.metrics)
--- End diff --

why is the timestamp -1 here?  if we're always logging it as -1, it doesn't 
seem very useful


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195290564
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -234,8 +272,18 @@ private[spark] class EventLoggingListener(
 }
   }
 
-  // No-op because logging every update would be overkill
-  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
+  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
+if (shouldLogExecutorMetricsUpdates) {
+  // For the active stages, record any new peak values for the memory 
metrics for the executor
+  event.executorUpdates.foreach { executorUpdates =>
+liveStageExecutorMetrics.values.foreach { peakExecutorMetrics =>
+  val peakMetrics = peakExecutorMetrics.getOrElseUpdate(
+event.execId, new PeakExecutorMetrics())
+  peakMetrics.compareAndUpdate(executorUpdates)
--- End diff --

couldn't you get the right timestamp here to log, as you do for updating 
the live entity?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195291809
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -304,6 +305,11 @@ class SparkContext(config: SparkConf) extends Logging {
 _dagScheduler = ds
   }
 
+  private[spark] def heartbeater: Heartbeater = _heartbeater
+  private[spark] def heartbeater_=(hb: Heartbeater): Unit = {
--- End diff --

I don't think you're using this getter and setter at all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195291213
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -98,14 +101,53 @@ class ExecutorSummary private[spark](
 val removeReason: Option[String],
 val executorLogs: Map[String, String],
 val memoryMetrics: Option[MemoryMetrics],
-val blacklistedInStages: Set[Int])
+val blacklistedInStages: Set[Int],
+@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer])
+@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer])
+val peakMemoryMetrics: Option[Array[Long]])
 
 class MemoryMetrics private[spark](
 val usedOnHeapStorageMemory: Long,
 val usedOffHeapStorageMemory: Long,
 val totalOnHeapStorageMemory: Long,
 val totalOffHeapStorageMemory: Long)
 
+/** deserialzer for peakMemoryMetrics: convert to array ordered by metric 
name */
+class PeakMemoryMetricsDeserializer extends 
JsonDeserializer[Option[Array[Long]]] {
+  override def deserialize(
+  jsonParser: JsonParser,
+  deserializationContext: DeserializationContext): Option[Array[Long]] 
= {
+val metricsMap = jsonParser.readValueAs(classOf[Option[Map[String, 
Object]]])
+metricsMap match {
+  case Some(metrics) =>
+Some(MetricGetter.values.map { m =>
+  metrics.getOrElse (m.name, 0L) match {
+case intVal: Int => intVal.toLong
+case longVal: Long => longVal
+  }
+}.toArray)
+  case None => None
+}
+  }
+}
+
+/** serializer for peakMemoryMetrics: convert array to map with metric 
name as key */
+class PeakMemoryMetricsSerializer extends 
JsonSerializer[Option[Array[Long]]] {
+  override def serialize(
+  metrics: Option[Array[Long]],
+  jsonGenerator: JsonGenerator,
+  serializerProvider: SerializerProvider): Unit = {
+metrics match {
+  case Some(m) =>
+val metricsMap = (0 until MetricGetter.values.length).map { idx =>
--- End diff --

```
MetricGetter.idxAndValues.map { case (idx, getter) =>
  getter.name -> m(idx)
}
```

(or maybe we can get rid of `idxAndValues` if it doesn't really help ...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195290278
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +182,31 @@ private[spark] class EventLoggingListener(
 
   // Events that trigger a flush
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
+if (shouldLogExecutorMetricsUpdates) {
+  // clear out any previous attempts, that did not have a stage 
completed event
--- End diff --

one potential issue here -- even though there is a stage completed event, 
you can still have tasks running from stage attempt (when there is a fetch 
failure, all existing tasks keep running).  Those leftover tasks will effect 
the memory usage for other tasks which run on those executors.

that said, I dunno if we can do much better here.  the alternative would be 
to track the task start & end events for each stage attempt.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-06-13 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r195290854
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -98,14 +101,53 @@ class ExecutorSummary private[spark](
 val removeReason: Option[String],
 val executorLogs: Map[String, String],
 val memoryMetrics: Option[MemoryMetrics],
-val blacklistedInStages: Set[Int])
+val blacklistedInStages: Set[Int],
+@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer])
+@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer])
+val peakMemoryMetrics: Option[Array[Long]])
 
 class MemoryMetrics private[spark](
 val usedOnHeapStorageMemory: Long,
 val usedOffHeapStorageMemory: Long,
 val totalOnHeapStorageMemory: Long,
 val totalOffHeapStorageMemory: Long)
 
+/** deserialzer for peakMemoryMetrics: convert to array ordered by metric 
name */
+class PeakMemoryMetricsDeserializer extends 
JsonDeserializer[Option[Array[Long]]] {
+  override def deserialize(
+  jsonParser: JsonParser,
+  deserializationContext: DeserializationContext): Option[Array[Long]] 
= {
+val metricsMap = jsonParser.readValueAs(classOf[Option[Map[String, 
Object]]])
--- End diff --

I think you might able to do
```
jsonParser.readValueAs(classOf[Option[Map[String, java.lang.Long]]])
```
and then everything will get read as a long which simplifies the code below 
... but I'm not 100% sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/115/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21559
  
**[Test build #91799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91799/testReport)**
 for PR 21559 at commit 
[`4ab9bda`](https://github.com/apache/spark/commit/4ab9bdaea895f6d0c76ee9ddd44c131f499eaec5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4005/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-13 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21535
  
@hvanhovell Added tests for interpreted encoders.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21559
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91799/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21535
  
**[Test build #91807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91807/testReport)**
 for PR 21535 at commit 
[`250074b`](https://github.com/apache/spark/commit/250074b0377c3fbcf63ebf355b6d61c4f4f9e446).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...

2018-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21559
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...

2018-06-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21547
  
**[Test build #91808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91808/testReport)**
 for PR 21547 at commit 
[`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >