[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91810/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91810/testReport)** for PR 21441 at commit [`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91809/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91809/testReport)** for PR 21441 at commit [`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21503: [SPARK-24478][SQL] Move projection and filter push down ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21503 cc @rxin if you are interested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195307036 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195306844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- `isNullFreshName` is new, we don't need it and can just call `freshName(name, BooleanType)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195306721 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- oh I missed the ctx parameter thing, let's leave it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r195305634 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -131,211 +132,214 @@ object FilterPushdownBenchmark { } /* +OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized8452 / 8504 1.9 537.3 1.0X -Parquet Vectorized (Pushdown) 274 / 281 57.3 17.4 30.8X -Native ORC Vectorized 8167 / 8185 1.9 519.3 1.0X -Native ORC Vectorized (Pushdown) 365 / 379 43.1 23.2 23.1X +Parquet Vectorized2961 / 3123 5.3 188.3 1.0X +Parquet Vectorized (Pushdown) 3057 / 3121 5.1 194.4 1.0X --- End diff -- Thank you for updating, @maropu . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21560 **[Test build #91817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91817/testReport)** for PR 21560 at commit [`252f5c9`](https://github.com/apache/spark/commit/252f5c9d0e4a5b6d1a456e847a53cf4f0e84dcfb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21547 **[Test build #91818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91818/testReport)** for PR 21547 at commit [`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/123/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4012/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21547 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91808/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21560 **[Test build #91816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91816/testReport)** for PR 21560 at commit [`03cc20d`](https://github.com/apache/spark/commit/03cc20d73dd547e476fad90d47225ef9e96a8cbc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21547 **[Test build #91808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91808/testReport)** for PR 21547 at commit [`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r195304544 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -131,211 +132,214 @@ object FilterPushdownBenchmark { } /* +OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized8452 / 8504 1.9 537.3 1.0X -Parquet Vectorized (Pushdown) 274 / 281 57.3 17.4 30.8X -Native ORC Vectorized 8167 / 8185 1.9 519.3 1.0X -Native ORC Vectorized (Pushdown) 365 / 379 43.1 23.2 23.1X +Parquet Vectorized2961 / 3123 5.3 188.3 1.0X +Parquet Vectorized (Pushdown) 3057 / 3121 5.1 194.4 1.0X --- End diff -- The result in v2.3.1: https://gist.github.com/maropu/88627246b7143ede5ab73c7183ab2128 That is not a regression, but I probably run the bench in wrong branch or commit. I re-ran the bench in the current master and updated the pr. how-to-run: I created a new `m4.2xlarge` instance, fetched this pr, rebased to master, and run the bench. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4011/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21560 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21560 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21560 @HeartSaVioR @arunmahadevan @xuanyuanking @tdas @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91815/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21560 [SPARK-24386][SS] coalesce(1) aggregates in continuous processing ## What changes were proposed in this pull request? Provide a continuous processing implementation of coalesce(1), as well as allowing aggregates on top of it. The changes in ContinuousQueuedDataReader and such are to use split.index (the ID of the partition within the RDD currently being compute()d) rather than context.partitionId() (the partition ID of the scheduled task within the Spark job - that is, the post coalesce writer). In the absence of a narrow dependency, these values were previously always the same, so there was no need to distinguish. ## How was this patch tested? new unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark coalesce Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21560 commit 1d6b71898e2a640e3c0809695d2b83f3f84eaa38 Author: Jose Torres Date: 2018-05-15T18:07:54Z continuous shuffle read RDD commit b5d100875932bdfcb645c8f6b2cdb7b815d84c80 Author: Jose Torres Date: 2018-05-17T03:11:11Z docs commit af407694a5f13c18568da4a63848f82374a44377 Author: Jose Torres Date: 2018-05-17T03:19:37Z Merge remote-tracking branch 'apache/master' into readerRddMaster commit 46456dc75a6aec9659b18523c421999debd060eb Author: Jose Torres Date: 2018-05-17T03:22:49Z fix ctor commit 2ea8a6f94216e8b184e5780ec3e6ffb2838de382 Author: Jose Torres Date: 2018-05-17T03:43:10Z multiple partition test commit 955ac79eb05dc389e632d1aaa6c59396835c6ed5 Author: Jose Torres Date: 2018-05-17T13:33:51Z unset task context after test commit 8cefb724512b51f2aa1fdd81fa8a2d4560e60ce3 Author: Jose Torres Date: 2018-05-18T00:00:05Z conf from RDD commit f91bfe7e3fc174202d7d5c7cde5a8fb7ce86bfd3 Author: Jose Torres Date: 2018-05-18T00:00:44Z endpoint name commit 259029298fc42a65e8ebb4d2effe49b7fafa96f1 Author: Jose Torres Date: 2018-05-18T00:02:08Z testing bool commit 859e6e4dd4dd90ffd70fc9cbd243c94090d72506 Author: Jose Torres Date: 2018-05-18T00:22:10Z tests commit b23b7bb17abe3cbc873a3144c56d08c88bc0c963 Author: Jose Torres Date: 2018-05-18T00:40:55Z take instead of poll commit 97f7e8ff865e6054d0d70914ce9bb51880b161f6 Author: Jose Torres Date: 2018-05-18T00:58:44Z add interface commit de21b1c25a333d44c0521fe151b468e51f0bdc47 Author: Jose Torres Date: 2018-05-18T01:02:37Z clarify comment commit 7dcf51a13e92a0bb2998e2a12e67d351e1c1a4fc Author: Jose Torres Date: 2018-05-18T22:39:28Z multiple commit ad0b5aab320413891f7c21ea6115b6da8d49ccf9 Author: Jose Torres Date: 2018-05-25T00:06:15Z writer with 1 reader partition commit c9adee5423c2e8a030911008d2e6942045d484bb Author: Jose Torres Date: 2018-05-25T00:15:39Z docs and iface commit 63d38d849107eed226449cec8d24c2241cd583c9 Author: Jose Torres Date: 2018-05-25T00:27:26Z Merge remote-tracking branch 'apache/master' into writerTask commit 331f437423262a1aa76754a8079d7c017e4ea28a Author: Jose Torres Date: 2018-05-25T00:37:14Z increment epoch commit f3ce67529372f72370a1e6028dc71a751acf26f2 Author: Jose Torres Date: 2018-05-25T00:40:39Z undo oop commit e0108d7bc164b9e5eeb757c13c80bc1d11671188 Author: Jose Torres Date: 2018-05-25T00:54:01Z make rdd loop commit 024f92d6bd471e207e1625dc6cdca31e1067deb8 Author: Jose Torres Date: 2018-05-25T22:56:59Z basic commit 8f1939b91dbef76879d5e5f2077dea35e5343e89 Author: Jose Torres Date: 2018-06-11T21:48:21Z coalesce working commit c99d9524d4778b973df34378e98d53a152e0a42c Author: Jose Torres Date: 2018-06-13T21:34:38Z Merge remote-tracking branch 'apache/master' into coalesce commit aaac0af0ddebffe64338a69a5a16dcfab9432a51 Author: Jose Torres Date: 2018-06-13T22:04:19Z fix merge commit 80d60db4c99e52e624dcbd19cc7c5ba519ff4e1c Author: Jose Torres Date: 2018-06-13T23:09:29Z rm spurious diffs commit 26b74f016033f582a61694133b82df6a40295c0b Author: Jose Torres Date: 2018-06-14T04:43:00Z unsupported check commit 03cc20d73dd547e476fad90d47225ef9e96a8cbc Author: Jose Torres Date: 2018-06-14T04:54:30Z change back timeout --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21288 **[Test build #91815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91815/testReport)** for PR 21288 at commit [`fa53156`](https://github.com/apache/spark/commit/fa53156599812adc94f089b8c163224fb2e4935f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21288 **[Test build #91815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91815/testReport)** for PR 21288 at commit [`fa53156`](https://github.com/apache/spark/commit/fa53156599812adc94f089b8c163224fb2e4935f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...
Github user lucashu1 commented on the issue: https://github.com/apache/spark/pull/21092 Sorry in advance if this is the wrong place to be asking this! Does this PR mean that we'll be able to create SparkContexts using PySpark's [`SparkSession.Builder`](https://spark.apache.org/docs/preview/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder) with `master` set to `k8s://<...>:<...>`, and have the resulting jobs run on spark-on-k8s, instead of on local/standalone? E.g.: ``` from pyspark.sql import SparkSession spark = SparkSession.builder.master('k8s://https://kubernetes:443').getOrCreate() ``` I'm trying to use PySpark in a Jupyter notebook that's running inside a Kubernetes pod, and have it use spark-on-k8s instead of resorting to using `local[*]` as `master`. Till now, I've been getting an error saying that: > Error: Python applications are currently not supported for Kubernetes. whenever I try to use `k8s://<...>` as `master`. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/122/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/121/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4010/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21379 **[Test build #91814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91814/testReport)** for PR 21379 at commit [`a3be215`](https://github.com/apache/spark/commit/a3be215755f00100be0817b2a59f1ea8a185518b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/120/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21379 **[Test build #91813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91813/testReport)** for PR 21379 at commit [`d76bc7f`](https://github.com/apache/spark/commit/d76bc7fc555bbfe4da25c959646a6ee5961d4d14). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4009/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91806/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21389 **[Test build #91806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91806/testReport)** for PR 21389 at commit [`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/114/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91805/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20929 **[Test build #91805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91805/testReport)** for PR 20929 at commit [`22e0d9f`](https://github.com/apache/spark/commit/22e0d9f12e4b08a4337c61371cf4ff795a2752b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21379 **[Test build #91812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91812/testReport)** for PR 21379 at commit [`a9b0306`](https://github.com/apache/spark/commit/a9b030682be358f36c0d2e64b175017458774b20). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4008/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/119/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20929 @mengxr ok, could you check? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91804/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #91804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91804/testReport)** for PR 21221 at commit [`99044e6`](https://github.com/apache/spark/commit/99044e6ec0cdc1b760c57dd5b7e74349384c6a98). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #91811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91811/testReport)** for PR 21221 at commit [`99044e6`](https://github.com/apache/spark/commit/99044e6ec0cdc1b760c57dd5b7e74349384c6a98). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21221 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195297024 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -98,14 +101,53 @@ class ExecutorSummary private[spark]( val removeReason: Option[String], val executorLogs: Map[String, String], val memoryMetrics: Option[MemoryMetrics], -val blacklistedInStages: Set[Int]) +val blacklistedInStages: Set[Int], +@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer]) +@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer]) +val peakMemoryMetrics: Option[Array[Long]]) class MemoryMetrics private[spark]( val usedOnHeapStorageMemory: Long, val usedOffHeapStorageMemory: Long, val totalOnHeapStorageMemory: Long, val totalOffHeapStorageMemory: Long) +/** deserialzer for peakMemoryMetrics: convert to array ordered by metric name */ +class PeakMemoryMetricsDeserializer extends JsonDeserializer[Option[Array[Long]]] { --- End diff -- can this be `private[spark]`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91802/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #91802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91802/testReport)** for PR 21221 at commit [`2662f6f`](https://github.com/apache/spark/commit/2662f6f9c6a7c34cea34b748f6735eb1625b73cb). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class PeakMemoryMetricsDeserializer extends JsonDeserializer[Option[Array[Long]]] ` * `class PeakMemoryMetricsSerializer extends JsonSerializer[Option[Array[Long]]] ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91803/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20929 **[Test build #91803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91803/testReport)** for PR 20929 at commit [`58054ef`](https://github.com/apache/spark/commit/58054ef61f61a999117ec8617eed34e446ddb078). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91810/testReport)** for PR 21441 at commit [`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4007/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/118/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21441 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91809/testReport)** for PR 21441 at commit [`f16c7f7`](https://github.com/apache/spark/commit/f16c7f72bd2f7b5d0824d33255bb46d5c9c54c32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/117/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195289142 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -93,6 +96,9 @@ private[spark] class EventLoggingListener( // Visible for tests only. private[scheduler] val logPath = getLogPath(logBaseDir, appId, appAttemptId, compressionCodecName) + // map of live stages, to peak executor metrics for the stage + private val liveStageExecutorMetrics = HashMap[(Int, Int), HashMap[String, PeakExecutorMetrics]]() --- End diff -- map of (stageId, stageAttempt) for live stages, to peak executor metrics for the stage --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195289072 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1751,7 +1753,7 @@ class DAGScheduler( messageScheduler.shutdownNow() eventProcessLoop.stop() taskScheduler.stop() - } + } --- End diff -- nit: old indentation was right --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195289751 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -169,6 +182,31 @@ private[spark] class EventLoggingListener( // Events that trigger a flush override def onStageCompleted(event: SparkListenerStageCompleted): Unit = { +if (shouldLogExecutorMetricsUpdates) { + // clear out any previous attempts, that did not have a stage completed event + val prevAttemptId = event.stageInfo.attemptNumber() - 1 + for (attemptId <- 0 to prevAttemptId) { +liveStageExecutorMetrics.remove((event.stageInfo.stageId, attemptId)) + } + + // log the peak executor metrics for the stage, for each live executor, + // whether or not the executor is running tasks for the stage + val accumUpdates = new ArrayBuffer[(Long, Int, Int, Seq[AccumulableInfo])]() + val executorMap = liveStageExecutorMetrics.remove( +(event.stageInfo.stageId, event.stageInfo.attemptNumber())) + executorMap.foreach { + executorEntry => { + for ((executorId, peakExecutorMetrics) <- executorEntry) { +val executorMetrics = new ExecutorMetrics(-1, peakExecutorMetrics.metrics) --- End diff -- why is the timestamp -1 here? if we're always logging it as -1, it doesn't seem very useful --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195290564 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -234,8 +272,18 @@ private[spark] class EventLoggingListener( } } - // No-op because logging every update would be overkill - override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { } + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { +if (shouldLogExecutorMetricsUpdates) { + // For the active stages, record any new peak values for the memory metrics for the executor + event.executorUpdates.foreach { executorUpdates => +liveStageExecutorMetrics.values.foreach { peakExecutorMetrics => + val peakMetrics = peakExecutorMetrics.getOrElseUpdate( +event.execId, new PeakExecutorMetrics()) + peakMetrics.compareAndUpdate(executorUpdates) --- End diff -- couldn't you get the right timestamp here to log, as you do for updating the live entity? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195291809 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -304,6 +305,11 @@ class SparkContext(config: SparkConf) extends Logging { _dagScheduler = ds } + private[spark] def heartbeater: Heartbeater = _heartbeater + private[spark] def heartbeater_=(hb: Heartbeater): Unit = { --- End diff -- I don't think you're using this getter and setter at all? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195291213 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -98,14 +101,53 @@ class ExecutorSummary private[spark]( val removeReason: Option[String], val executorLogs: Map[String, String], val memoryMetrics: Option[MemoryMetrics], -val blacklistedInStages: Set[Int]) +val blacklistedInStages: Set[Int], +@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer]) +@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer]) +val peakMemoryMetrics: Option[Array[Long]]) class MemoryMetrics private[spark]( val usedOnHeapStorageMemory: Long, val usedOffHeapStorageMemory: Long, val totalOnHeapStorageMemory: Long, val totalOffHeapStorageMemory: Long) +/** deserialzer for peakMemoryMetrics: convert to array ordered by metric name */ +class PeakMemoryMetricsDeserializer extends JsonDeserializer[Option[Array[Long]]] { + override def deserialize( + jsonParser: JsonParser, + deserializationContext: DeserializationContext): Option[Array[Long]] = { +val metricsMap = jsonParser.readValueAs(classOf[Option[Map[String, Object]]]) +metricsMap match { + case Some(metrics) => +Some(MetricGetter.values.map { m => + metrics.getOrElse (m.name, 0L) match { +case intVal: Int => intVal.toLong +case longVal: Long => longVal + } +}.toArray) + case None => None +} + } +} + +/** serializer for peakMemoryMetrics: convert array to map with metric name as key */ +class PeakMemoryMetricsSerializer extends JsonSerializer[Option[Array[Long]]] { + override def serialize( + metrics: Option[Array[Long]], + jsonGenerator: JsonGenerator, + serializerProvider: SerializerProvider): Unit = { +metrics match { + case Some(m) => +val metricsMap = (0 until MetricGetter.values.length).map { idx => --- End diff -- ``` MetricGetter.idxAndValues.map { case (idx, getter) => getter.name -> m(idx) } ``` (or maybe we can get rid of `idxAndValues` if it doesn't really help ...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195290278 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -169,6 +182,31 @@ private[spark] class EventLoggingListener( // Events that trigger a flush override def onStageCompleted(event: SparkListenerStageCompleted): Unit = { +if (shouldLogExecutorMetricsUpdates) { + // clear out any previous attempts, that did not have a stage completed event --- End diff -- one potential issue here -- even though there is a stage completed event, you can still have tasks running from stage attempt (when there is a fetch failure, all existing tasks keep running). Those leftover tasks will effect the memory usage for other tasks which run on those executors. that said, I dunno if we can do much better here. the alternative would be to track the task start & end events for each stage attempt. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r195290854 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -98,14 +101,53 @@ class ExecutorSummary private[spark]( val removeReason: Option[String], val executorLogs: Map[String, String], val memoryMetrics: Option[MemoryMetrics], -val blacklistedInStages: Set[Int]) +val blacklistedInStages: Set[Int], +@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer]) +@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer]) +val peakMemoryMetrics: Option[Array[Long]]) class MemoryMetrics private[spark]( val usedOnHeapStorageMemory: Long, val usedOffHeapStorageMemory: Long, val totalOnHeapStorageMemory: Long, val totalOffHeapStorageMemory: Long) +/** deserialzer for peakMemoryMetrics: convert to array ordered by metric name */ +class PeakMemoryMetricsDeserializer extends JsonDeserializer[Option[Array[Long]]] { + override def deserialize( + jsonParser: JsonParser, + deserializationContext: DeserializationContext): Option[Array[Long]] = { +val metricsMap = jsonParser.readValueAs(classOf[Option[Map[String, Object]]]) --- End diff -- I think you might able to do ``` jsonParser.readValueAs(classOf[Option[Map[String, java.lang.Long]]]) ``` and then everything will get read as a long which simplifies the code below ... but I'm not 100% sure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/115/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21535 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21559 **[Test build #91799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91799/testReport)** for PR 21559 at commit [`4ab9bda`](https://github.com/apache/spark/commit/4ab9bdaea895f6d0c76ee9ddd44c131f499eaec5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4005/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21535 @hvanhovell Added tests for interpreted encoders. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21559 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91799/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21535 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21535 **[Test build #91807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91807/testReport)** for PR 21535 at commit [`250074b`](https://github.com/apache/spark/commit/250074b0377c3fbcf63ebf355b6d61c4f4f9e446). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21559 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21547 **[Test build #91808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91808/testReport)** for PR 21547 at commit [`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org