[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r214505595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1357,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) --- End diff -- super nit: comment in https://github.com/apache/spark/pull/22205/files#diff-a636a87d8843eeccca90140be91d4fafR1348 not change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22307 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22307 **[Test build #95573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95573/testReport)** for PR 22307 at commit [`60cc1c9`](https://github.com/apache/spark/commit/60cc1c9c66dade490dc0501622f8ac6b554b7ff4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2756/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22307: [SPARK-25301][SQL] When a view uses an UDF from a...
GitHub user vinodkc opened a pull request: https://github.com/apache/spark/pull/22307 [SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException ## What changes were proposed in this pull request? When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException Steps to simulate this issue - In Hive 1) CREATE DATABASE d100; 2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 'com.uzx.udf.Masking' 3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is created in d100 4) create view d100.v100 as select d100.udf100(name)ââfrom default.emp; // Note : table default.emp has two columns 'nanme', 'address', 5) select * from d100.v100; // query on view d100.v100 gives correct result In Spark - 1) spark.sql("select * from d100.v100").show throws ``` org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. This function is neither a registered temporary function nor a permanent function registered in the database 'default' ``` This is because, while parsing the SQL statement of the View ``` 'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' ``` , spark parser fails to split database name and udf name and hence Spark function registry tries to load the UDF 'd100.udf100' from 'default' database. To solve this issue, before creating 'FunctionIdentifier' , try to get actual database name and then create FunctionIdentifier using that database name and function name ## How was this patch tested? Added 1 unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/vinodkc/spark br_fix_view_with_udf_issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22307.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22307 commit 60cc1c9c66dade490dc0501622f8ac6b554b7ff4 Author: Vinod KC Date: 2018-08-31T16:57:00Z fix issue with non default udf in hive view --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214505265 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + /** Dropwizard metrics gauge measuring the executor's process CPU time. + * This code will try to get JVM Process CPU time or return -1 otherwise. + * The CPU time value is returned in nanoseconds. + * It will use proprietary extensions as com.sun.management.OperatingSystemMXBean or + * com.ibm.lang.management.OperatingSystemMXBean if available + */ + val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer + val name = new ObjectName("java.lang", "type", "OperatingSystem") + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { --- End diff -- nit: `name("executorCPUTime" )` -> `name("executorCPUTime")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22302 **[Test build #95572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95572/testReport)** for PR 22302 at commit [`c3c5af2`](https://github.com/apache/spark/commit/c3c5af2b9f27e4effcc8f44f1b98f24a6d1fafa7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22302 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22302 @fjh100456 Can you format the PR description cleanly to make others more understood? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22302 @gatorsmile @ueshin Can you trigger this? I checked the related jira and code and, then I think these tests should be passed in master when `usingCTAS`=true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95567/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95567/testReport)** for PR 22192 at commit [`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95565/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95565/testReport)** for PR 22192 at commit [`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20637 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2755/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22306 **[Test build #95571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95571/testReport)** for PR 22306 at commit [`82525d7`](https://github.com/apache/spark/commit/82525d753b80aef856217b9a161966a7ad499eca). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20637 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22306: [SPARK-25300][CORE]Unified the configuration para...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/22306 [SPARK-25300][CORE]Unified the configuration parameter `spark.shuffle.service.enabled` ## What changes were proposed in this pull request? The configuration parameter "spark.shuffle.service.enabled" has defined in `package.scala`, and it is also used in many place, so we can replace it with `SHUFFLE_SERVICE_ENABLED`. and unified this configuration parameter "spark.shuffle.service.port" together. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark unifiedserviceenable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22306.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22306 commit 82525d753b80aef856217b9a161966a7ad499eca Author: liuxian Date: 2018-09-01T03:09:08Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214502781 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -48,6 +48,8 @@ class RowBasedHashMapGenerator( val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema) val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema) +val numVarLenFields = groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt)) --- End diff -- @cloud-fan We want to discuss, how to modify? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214502585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -48,6 +48,8 @@ class RowBasedHashMapGenerator( val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema) val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema) +val numVarLenFields = groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt)) --- End diff -- @cloud-fan We want to discuss, how to modify? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95566/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22204 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22204 **[Test build #95566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95566/testReport)** for PR 22204 at commit [`929bc2d`](https://github.com/apache/spark/commit/929bc2d9eeaeafebca54deb07c6a428a8b661380). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22298 @mccheah as such, that should be a followup PR and this should be good to merge as long as @holdenk gives a LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95563/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214500639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { --- End diff -- many thanks! ya, plz ping me to review ;) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/7 **[Test build #95563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95563/testReport)** for PR 7 at commit [`d80b1a1`](https://github.com/apache/spark/commit/d80b1a15ed8941bad78df2c5f7168a4196d27be4). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { --- End diff -- @maropu You are right !! I will try to optimize this in the other pr i am going to open. please check if you like it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { df5.select(map_from_arrays($"k", $"v")).collect --- End diff -- @maropu I will double check and get back. But i think it was SparkException. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499155 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { - ${ev.isNull} = true; + ${setIsNullCode} } else if (${ctx.genEqual(right.dataType, value, getValue)}) { - ${ev.isNull} = false; + ${unsetIsNullCode} --- End diff -- @maropu will change --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499145 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { test("SPARK-9083: sort with non-deterministic expressions") { import org.apache.spark.util.random.XORShiftRandom --- End diff -- @maropu will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22270 @cloud-fan Thank you Wenchen. Do we want to fix the two codegen compile errors in 2.4 ? One is in ArrayContains and the other is in ArraySort. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2754/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2754/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22270 Thanks for looking into this! I think we have to clean up our test framework later (after 2.4). We should identify the test cases that are actually testing the expressions, and run it with/without enabling the local relation optimization, to test both codegen and interpreted code paths. Since the current test suites are a little messy, this will be a lot of work, to reorganize them. I'm looking forward to seeing us accomplish it in Spark 3.0! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2754/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/22298 https://issues.apache.org/jira/browse/SPARK-25291 looks like a real issue with the way the tests are written. So we don't necessarily want to ignore it for this patch, but we're still thinking about it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214497013 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,36 +229,74 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress /** - * Splits str around pat (pattern is a regular expression). + * Splits str around matches of the given regex. */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`" + +" and returns an array of at most `limit`", + arguments = """ +Arguments: + * str - a string expression to split. + * regex - a string representing a regular expression. The regex string should be a +Java regular expression. + * limit - an integer expression which controls the number of times the regex is applied. + +limit > 0: The resulting array's length will not be more than `limit`, + and the resulting array's last entry will contain all input + beyond the last matched regex. +limit <= 0: `regex` will be applied as many times as possible, and +the resulting array can be of any size. + """, examples = """ Examples: > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); ["one","two","three",""] + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); + ["one","two","three",""] + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2); + ["one","twoBthreeC"] """) -case class StringSplit(str: Expression, pattern: Expression) - extends BinaryExpression with ImplicitCastInputTypes { +case class StringSplit(str: Expression, regex: Expression, limit: Expression) + extends TernaryExpression with ImplicitCastInputTypes { - override def left: Expression = str - override def right: Expression = pattern override def dataType: DataType = ArrayType(StringType) - override def inputTypes: Seq[DataType] = Seq(StringType, StringType) + override def inputTypes: Seq[DataType] = Seq(StringType, StringType, IntegerType) + override def children: Seq[Expression] = str :: regex :: limit :: Nil + + def this(exp: Expression, regex: Expression) = this(exp, regex, Literal(-1)); - override def nullSafeEval(string: Any, regex: Any): Any = { -val strings = string.asInstanceOf[UTF8String].split(regex.asInstanceOf[UTF8String], -1) + override def nullSafeEval(string: Any, regex: Any, limit: Any): Any = { +val strings = string.asInstanceOf[UTF8String].split( + regex.asInstanceOf[UTF8String], maybeFallbackLimitValue(limit.asInstanceOf[Int])) new GenericArrayData(strings.asInstanceOf[Array[Any]]) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val arrayClass = classOf[GenericArrayData].getName -nullSafeCodeGen(ctx, ev, (str, pattern) => +nullSafeCodeGen(ctx, ev, (str, regex, limit) => { // Array in java is covariant, so we don't need to cast UTF8String[] to Object[]. - s"""${ev.value} = new $arrayClass($str.split($pattern, -1));""") + s"""${ev.value} = new $arrayClass($str.split( + $regex,${handleCodeGenLimitFallback(limit)}));""".stripMargin +}) } override def prettyName: String = "split" + + /** + * Java String's split method supports "ignore empty string" behavior when the limit is 0. + * To avoid this, we fall back to -1 when the limit is 0. Otherwise, this is a noop. + */ + def maybeFallbackLimitValue(limit: Int): Int = { --- End diff -- +1, and please add `limit = 0` case in `UTF8StringSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 **[Test build #95570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95570/testReport)** for PR 22298 at commit [`3ad1324`](https://github.com/apache/spark/commit/3ad13242f0f5c5d34cda8ca5d0e0ad0b1bdbcead). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95570/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22298 @holdenk and @mccheah any other comments before merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 **[Test build #95570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95570/testReport)** for PR 22298 at commit [`3ad1324`](https://github.com/apache/spark/commit/3ad13242f0f5c5d34cda8ca5d0e0ad0b1bdbcead). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/22298#discussion_r214495391 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -51,6 +51,7 @@ private[spark] class BasicDriverFeatureStep( .get(DRIVER_MEMORY_OVERHEAD) .getOrElse(math.max((conf.get(MEMORY_OVERHEAD_FACTOR) * driverMemoryMiB).toInt, MEMORY_OVERHEAD_MIN_MIB)) + // TODO: Have memory limit checks on driverMemory --- End diff -- Valid point. These are not necessary --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on the issue: https://github.com/apache/spark/pull/22192 These tests are failing locally for me, both with my code and without on a clean pull of master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95559/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95559/testReport)** for PR 22192 at commit [`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20637 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21743: [SPARK-24767][Launcher] Propagate MDC to spark-submit th...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21743 also, I don't recall anywhere in spark that depends/sets MDC... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95560/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20637 **[Test build #95560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95560/testReport)** for PR 20637 at commit [`88c74c6`](https://github.com/apache/spark/commit/88c74c61d5ab64eb860b91261d013702983ac49c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22266: [SPARK-25270] lint-python: Add flake8 to find syn...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22266#discussion_r214494032 --- Diff: dev/lint-python --- @@ -82,6 +82,25 @@ else rm "$PYCODESTYLE_REPORT_PATH" fi +# stop the build if there are Python syntax errors or undefined names +flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics --- End diff -- Yea, sounds good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2753/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95569/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22215 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2753/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22215 **[Test build #95569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95569/testReport)** for PR 22215 at commit [`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95568/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22215 **[Test build #95568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95568/testReport)** for PR 22215 at commit [`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22215 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2753/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22298 Looks fine to me, but I'm not familiar enough with the K8S code to have much of an opinion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22298#discussion_r214492297 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala --- @@ -72,12 +72,33 @@ private[spark] trait PythonTestsSuite { k8sSuite: KubernetesSuite => isJVM = false, pyFiles = Some(PYSPARK_CONTAINER_TESTS)) } + + test("Run PySpark with memory customization", k8sTestTag) { +sparkAppConf + .set("spark.kubernetes.container.image", s"${getTestImageRepo}/spark-py:${getTestImageTag}") --- End diff -- Some of this stuff can be factored out I think, we just haven't done so yet. I wouldn't block a merge of this on such a refactor, but this entire test class could probably use some cleanup with respect to how the code is structured. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214492145 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -48,6 +48,8 @@ class RowBasedHashMapGenerator( val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema) val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema) +val numVarLenFields = groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt)) --- End diff -- plz keep the comment `// TODO: consider large decimal and interval type` below --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22298#discussion_r214492080 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/JavaDriverFeatureStep.scala --- @@ -38,7 +39,8 @@ private[spark] class JavaDriverFeatureStep( .build() SparkPod(pod.pod, withDriverArgs) } - override def getAdditionalPodSystemProperties(): Map[String, String] = Map.empty + override def getAdditionalPodSystemProperties(): Map[String, String] = --- End diff -- Again, not sure what our direction is going to be with respect to mixed pipelines throughout the cluster manager sections - if we should be supporting it in a first class way then perhaps we file a JIRA and we can discuss how the submission client should be refactored to support that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22215 **[Test build #95569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95569/testReport)** for PR 22215 at commit [`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214491914 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -48,6 +48,8 @@ class RowBasedHashMapGenerator( val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema) val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema) +val numVarLenFields = groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt)) --- End diff -- super nit: `.count(!UnsafeRow.isFixedLength(_))`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22298#discussion_r214491727 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -51,6 +51,7 @@ private[spark] class BasicDriverFeatureStep( .get(DRIVER_MEMORY_OVERHEAD) .getOrElse(math.max((conf.get(MEMORY_OVERHEAD_FACTOR) * driverMemoryMiB).toInt, MEMORY_OVERHEAD_MIN_MIB)) + // TODO: Have memory limit checks on driverMemory --- End diff -- Hm can you elaborate here? We already set the driver memory limit in this step based on the overhead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214491757 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- Yes. I checked that, @srowen . `org.apache.orc` only uses Kryo constructor, `writeObject`, and `readObject` from `kryo-shaded` library. There is no change for them. **WRITE** ``` (new Kryo()).writeObject(out, sarg); ``` **READ** ``` ... = (new Kryo()).readObject(new Input(sargBytes), SearchArgumentImpl.class); ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22215 **[Test build #95568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95568/testReport)** for PR 22215 at commit [`6f6442f`](https://github.com/apache/spark/commit/6f6442f392717fe87002e9bc1b27c91ff387080e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22215#discussion_r214491272 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala --- @@ -151,13 +152,15 @@ private[spark] class ExecutorPodsLifecycleManager( private def exitReasonMessage(podState: FinalPodState, execId: Long, exitCode: Int) = { val pod = podState.pod +val reason = Option(pod.getStatus.getReason) +val message = Option(pod.getStatus.getMessage) s""" |The executor with id $execId exited with exit code $exitCode. - |The API gave the following brief reason: ${pod.getStatus.getReason} - |The API gave the following message: ${pod.getStatus.getMessage} + |The API gave the following brief reason: ${reason.getOrElse("")} --- End diff -- Maybe default as `N/A`? Might be confusing to be left blank. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22215#discussion_r214490999 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -60,4 +64,81 @@ private[spark] object KubernetesUtils { } def parseMasterUrl(url: String): String = url.substring("k8s://".length) + + def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : String = { +// Use more loggable format if value is null or empty +val indentStr = "\t" * indent --- End diff -- Can we prefer space-based indentation? Curious as to whether others have an opinion about this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22215#discussion_r214491177 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -60,4 +64,81 @@ private[spark] object KubernetesUtils { } def parseMasterUrl(url: String): String = url.substring("k8s://".length) + + def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : String = { +// Use more loggable format if value is null or empty +val indentStr = "\t" * indent +pairs.map { + case (k, v) => s"\n$indentStr $k: ${Option(v).filter(_.nonEmpty).getOrElse("N/A")}" +}.mkString("") + } + + /** + * Given a pod, output a human readable representation of its state + * + * @param pod Pod + * @return Human readable pod state + */ + def formatPodState(pod: Pod): String = { +val details = Seq[(String, String)]( + // pod metadata + ("pod name", pod.getMetadata.getName), + ("namespace", pod.getMetadata.getNamespace), + ("labels", pod.getMetadata.getLabels.asScala.mkString(", ")), + ("pod uid", pod.getMetadata.getUid), + ("creation time", formatTime(pod.getMetadata.getCreationTimestamp)), + + // spec details + ("service account name", pod.getSpec.getServiceAccountName), + ("volumes", pod.getSpec.getVolumes.asScala.map(_.getName).mkString(", ")), + ("node name", pod.getSpec.getNodeName), + + // status + ("start time", formatTime(pod.getStatus.getStartTime)), + ("phase", pod.getStatus.getPhase), + ("container status", containersDescription(pod, 2)) +) + +formatPairsBundle(details) + } + + def containersDescription(p: Pod, indent: Int = 1): String = { +p.getStatus.getContainerStatuses.asScala.map { status => + Seq( +("container name", status.getName), +("container image", status.getImage)) ++ +containerStatusDescription(status) +}.map(p => formatPairsBundle(p, indent)).mkString("\n\n") + } + + def containerStatusDescription(containerStatus: ContainerStatus) +: Seq[(String, String)] = { +val state = containerStatus.getState +Option(state.getRunning) --- End diff -- This is a really cool use of partial functions - wonder if there's other places where we should be matching this way (of course doesn't have to be done here). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/22215 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214491144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,36 +229,74 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress /** - * Splits str around pat (pattern is a regular expression). + * Splits str around matches of the given regex. */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`" + +" and returns an array of at most `limit`", + arguments = """ +Arguments: + * str - a string expression to split. + * regex - a string representing a regular expression. The regex string should be a +Java regular expression. + * limit - an integer expression which controls the number of times the regex is applied. + +limit > 0: The resulting array's length will not be more than `limit`, + and the resulting array's last entry will contain all input + beyond the last matched regex. +limit <= 0: `regex` will be applied as many times as possible, and +the resulting array can be of any size. + """, examples = """ Examples: > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); ["one","two","three",""] + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); + ["one","two","three",""] + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2); + ["one","twoBthreeC"] """) -case class StringSplit(str: Expression, pattern: Expression) - extends BinaryExpression with ImplicitCastInputTypes { +case class StringSplit(str: Expression, regex: Expression, limit: Expression) + extends TernaryExpression with ImplicitCastInputTypes { - override def left: Expression = str - override def right: Expression = pattern override def dataType: DataType = ArrayType(StringType) - override def inputTypes: Seq[DataType] = Seq(StringType, StringType) + override def inputTypes: Seq[DataType] = Seq(StringType, StringType, IntegerType) + override def children: Seq[Expression] = str :: regex :: limit :: Nil + + def this(exp: Expression, regex: Expression) = this(exp, regex, Literal(-1)); - override def nullSafeEval(string: Any, regex: Any): Any = { -val strings = string.asInstanceOf[UTF8String].split(regex.asInstanceOf[UTF8String], -1) + override def nullSafeEval(string: Any, regex: Any, limit: Any): Any = { +val strings = string.asInstanceOf[UTF8String].split( + regex.asInstanceOf[UTF8String], maybeFallbackLimitValue(limit.asInstanceOf[Int])) new GenericArrayData(strings.asInstanceOf[Array[Any]]) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val arrayClass = classOf[GenericArrayData].getName -nullSafeCodeGen(ctx, ev, (str, pattern) => +nullSafeCodeGen(ctx, ev, (str, regex, limit) => { // Array in java is covariant, so we don't need to cast UTF8String[] to Object[]. - s"""${ev.value} = new $arrayClass($str.split($pattern, -1));""") + s"""${ev.value} = new $arrayClass($str.split( + $regex,${handleCodeGenLimitFallback(limit)}));""".stripMargin +}) } override def prettyName: String = "split" + + /** + * Java String's split method supports "ignore empty string" behavior when the limit is 0. + * To avoid this, we fall back to -1 when the limit is 0. Otherwise, this is a noop. + */ + def maybeFallbackLimitValue(limit: Int): Int = { --- End diff -- To make all the split behaviour consistent in Spark, how about trying to fix code in `UTF8String.split` directly? (We need to check if the change does not break the existing behaviour, too) cc: @ueshin @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22257: [SPARK-25264][K8S] Fix comma-delineated arguments...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22257 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22048#discussion_r214490659 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging { } } } + + /** + * Regular expression matching full width characters + */ + private val fullWidthRegex = ("""[""" + +// scalastyle:off nonascii +"""\u1100-\u115F""" + +"""\u2E80-\uA4CF""" + +"""\uAC00-\uD7A3""" + +"""\uF900-\uFAFF""" + +"""\uFE10-\uFE19""" + +"""\uFE30-\uFE6F""" + +"""\uFF00-\uFF60""" + +"""\uFFE0-\uFFE6""" + --- End diff -- A general question. - How to get this Regex list? Any reference? It sounds like this should be a general problem - What is the performance impact? Can you answer them and post them in the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22257: [SPARK-25264][K8S] Fix comma-delineated arguments passed...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/22257 Looks good, thanks. Merging. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214490061 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + /** Dropwizard metrics gauge measuring the executor's process CPU time. + * This code will try to get JVM Process CPU time or return -1 otherwise. + * The CPU time value is returned in nanoseconds. + * It will use proprietary extensions as com.sun.management.OperatingSystemMXBean or + * com.ibm.lang.management.OperatingSystemMXBean if available + */ + val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer + val name = new ObjectName("java.lang", "type", "OperatingSystem") + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { +override def getValue: Long = { + try { +val attribute = mBean.getAttribute(name, "ProcessCpuTime") +if (attribute != null) { + attribute.asInstanceOf[Long] +} else { + -1L --- End diff -- Any reason to return -1 instead of 0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214489886 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + /** Dropwizard metrics gauge measuring the executor's process CPU time. + * This code will try to get JVM Process CPU time or return -1 otherwise. + * The CPU time value is returned in nanoseconds. + * It will use proprietary extensions as com.sun.management.OperatingSystemMXBean or + * com.ibm.lang.management.OperatingSystemMXBean if available + */ + val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer + val name = new ObjectName("java.lang", "type", "OperatingSystem") + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { --- End diff -- a little confused with the exsiting `cpuTime`. How about `jvmCpuTime`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214489390 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + /** Dropwizard metrics gauge measuring the executor's process CPU time. + * This code will try to get JVM Process CPU time or return -1 otherwise. + * The CPU time value is returned in nanoseconds. + * It will use proprietary extensions as com.sun.management.OperatingSystemMXBean or + * com.ibm.lang.management.OperatingSystemMXBean if available + */ + val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer + val name = new ObjectName("java.lang", "type", "OperatingSystem") + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { +override def getValue: Long = { + try { +val attribute = mBean.getAttribute(name, "ProcessCpuTime") +if (attribute != null) { + attribute.asInstanceOf[Long] +} else { + -1L +} + } catch { +case _ : Exception => -1L --- End diff -- `case NonFatal(_) => -1`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95567/testReport)** for PR 22192 at commit [`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22204 **[Test build #95566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95566/testReport)** for PR 22204 at commit [`929bc2d`](https://github.com/apache/spark/commit/929bc2d9eeaeafebca54deb07c6a428a8b661380). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22204 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2752/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22304: [SPARK-25297][Streaming][Test] Fix blocking unit tests f...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22304 The one concern is whether this behavior change impacts anything else. Now some exceptions will cause an actual exception in a caller to methods like Await, rather than just logging and continuing. That might even be a good thing, and is crucial here. But that's what I have in mind. That said, tests all pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22192 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r214486314 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Before; +import org.junit.Test; + +import static org.junit.Assert.*; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + // Static value modified by testing plugin to verify plugins shut down properly. + public static int numSuccessfulTerminations = 0; + private JavaSparkContext sc; + + private String EXECUTOR_PLUGIN_CONF_NAME = "spark.executor.plugins"; + private String testPluginName = TestExecutorPlugin.class.getName(); --- End diff -- final static --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r214486250 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -604,4 +604,14 @@ package object config { .intConf .checkValue(v => v > 0, "The max failures should be a positive value.") .createWithDefault(40) + + private[spark] val EXECUTOR_PLUGINS = +ConfigBuilder("spark.executor.plugins") + .doc("Comma-separated list of class names for \"plugins\" implementing " + +"org.apache.spark.ExecutorPlugin. Plugins have the same privileges as any task " + +"in a spark executor. They can also interfere with task execution and fail in " + --- End diff -- Spark --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r214486385 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Before; +import org.junit.Test; + +import static org.junit.Assert.*; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + // Static value modified by testing plugin to verify plugins shut down properly. + public static int numSuccessfulTerminations = 0; + private JavaSparkContext sc; --- End diff -- Move after the variables below (which should be "final static"). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95565/testReport)** for PR 22192 at commit [`fa19ea8`](https://github.com/apache/spark/commit/fa19ea880f4fc7eafb736e372479d4df2bffb74b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r214486075 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -136,6 +136,15 @@ private[spark] class Executor( // for fetching remote cached RDD blocks, so need to make sure it uses the right classloader too. env.serializerManager.setDefaultClassLoader(replClassLoader) + private val pluginList = conf.get(EXECUTOR_PLUGINS) + if (pluginList != Nil) logDebug(s"Loading the following plugins: ${pluginList.mkString(", ")}") + // Load executor plugins + Thread.currentThread().setContextClassLoader(replClassLoader) + private val executorPlugins = +Utils.loadExtensions(classOf[ExecutorPlugin], pluginList, conf) + executorPlugins.foreach(_.init()) + if (pluginList != Nil) logDebug("Finished loading plugins") --- End diff -- Same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org