[spark] branch branch-2.4 updated: [SPARK-30489][BUILD] Make build delete pyspark.zip file properly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 3b029d9 [SPARK-30489][BUILD] Make build delete pyspark.zip file properly 3b029d9 is described below commit 3b029d911d56f3071e348a7a2c6e1b285143e9fc Author: Jeff Evans AuthorDate: Fri Jan 10 16:59:51 2020 -0800 [SPARK-30489][BUILD] Make build delete pyspark.zip file properly ### What changes were proposed in this pull request? A small fix to the Maven build file under the `assembly` module by switch "dir" attribute to "file". ### Why are the changes needed? To make the `` task properly delete an existing zip file. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Ran a build with the change and confirmed that a corrupted zip file was replaced with the correct one. Closes #27171 from jeff303/SPARK-30489. Authored-by: Jeff Evans Signed-off-by: Dongjoon Hyun (cherry picked from commit 582509b7ae76bc298c31a68bcfd7011c1b9e23a7) Signed-off-by: Dongjoon Hyun --- assembly/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 432a388..a7d0f0e 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -117,7 +117,7 @@ - + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f372d1c -> 582509b)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f372d1c [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+ add 582509b [SPARK-30489][BUILD] Make build delete pyspark.zip file properly No new revisions were added by this update. Summary of changes: assembly/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b5bc3e1 -> f372d1c)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b5bc3e1 [SPARK-30312][SQL] Preserve path permission and acl when truncate table add f372d1c [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+ No new revisions were added by this update. Summary of changes: docs/pyspark-migration-guide.md| 2 ++ python/pyspark/sql/tests/test_types.py | 13 python/pyspark/sql/types.py| 56 +++--- python/run-tests.py| 3 +- 4 files changed, 62 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fb17f59 -> b5bc3e1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fb17f59 [SPARK-29779][CORE] Compact old event log files and cleanup add b5bc3e1 [SPARK-30312][SQL] Preserve path permission and acl when truncate table No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 11 +++ .../spark/sql/execution/command/tables.scala | 47 + .../spark/sql/execution/command/DDLSuite.scala | 79 +- 3 files changed, 136 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (6ac3659 -> 0a5757e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 6ac3659 [SPARK-30410][SQL][2.4] Calculating size of table with large number of partitions causes flooding logs add 0a5757e [SPARK-30447][SQL][2.4] Constant propagation nullability issue No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/expressions.scala | 41 -- .../optimizer/ConstantPropagationSuite.scala | 25 - .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 9 + 3 files changed, 63 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2bd8731 -> 7fb17f59)
This is an automated email from the ASF dual-hosted git repository. vanzin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2bd8731 [SPARK-30468][SQL] Use multiple lines to display data columns for show create table command add 7fb17f59 [SPARK-29779][CORE] Compact old event log files and cleanup No new revisions were added by this update. Summary of changes: apache.spark.deploy.history.EventFilterBuilder | 1 + .../deploy/history/BasicEventFilterBuilder.scala | 176 +++ .../apache/spark/deploy/history/EventFilter.scala | 109 +++ .../deploy/history/EventLogFileCompactor.scala | 224 ++ .../spark/deploy/history/EventLogFileReaders.scala | 28 +- .../spark/deploy/history/EventLogFileWriters.scala | 28 +- .../org/apache/spark/internal/config/package.scala | 18 ++ .../history/BasicEventFilterBuilderSuite.scala | 228 ++ .../deploy/history/BasicEventFilterSuite.scala | 208 + .../history/EventLogFileCompactorSuite.scala | 326 + .../deploy/history/EventLogFileReadersSuite.scala | 6 +- .../deploy/history/EventLogFileWritersSuite.scala | 4 +- .../spark/deploy/history/EventLogTestHelper.scala | 55 +++- .../spark/status/AppStatusListenerSuite.scala | 38 +-- .../spark/status/ListenerEventsTestHelper.scala| 154 ++ 15 files changed, 1545 insertions(+), 58 deletions(-) create mode 100644 core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/BasicEventFilterBuilder.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala create mode 100644 core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterBuilderSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/status/ListenerEventsTestHelper.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b942832 -> 2bd8731)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b942832 [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates add 2bd8731 [SPARK-30468][SQL] Use multiple lines to display data columns for show create table command No new revisions were added by this update. Summary of changes: .../spark/sql/execution/command/tables.scala | 22 +++ .../sql-tests/results/show-create-table.sql.out| 67 -- .../apache/spark/sql/ShowCreateTableSuite.scala| 15 +++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 6 +- 4 files changed, 72 insertions(+), 38 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b942832 -> 2bd8731)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b942832 [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates add 2bd8731 [SPARK-30468][SQL] Use multiple lines to display data columns for show create table command No new revisions were added by this update. Summary of changes: .../spark/sql/execution/command/tables.scala | 22 +++ .../sql-tests/results/show-create-table.sql.out| 67 -- .../apache/spark/sql/ShowCreateTableSuite.scala| 15 +++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 6 +- 4 files changed, 72 insertions(+), 38 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d6532c7 -> b942832)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d6532c7 [SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting resource add b942832 [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/RewriteDistinctAggregates.scala | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting resource
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d6532c7 [SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting resource d6532c7 is described below commit d6532c7079f22f32e90e1c69c25bdfab51c7c53e Author: Thomas Graves AuthorDate: Fri Jan 10 08:32:28 2020 -0600 [SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting resource ### What changes were proposed in this pull request? This PR is to make sure cores is the limiting resource when using accelerator aware scheduling and fix a few issues with SparkContext.checkResourcesPerTask For the first version of accelerator aware scheduling(SPARK-27495), the SPIP had a condition that we can support dynamic allocation because we were going to have a strict requirement that we don't waste any resources. This means that the number of slots each executor has could be calculated from the number of cores and task cpus just as is done today. Somewhere along the line of development we relaxed that and only warn when we are wasting resources. This breaks the dynamic allocation logic if the limiting resource is no longer the cores because its using the cores and task cpus to calculate the number of executors it needs. This means we will request less executors then we really need to run everything. We have to enforce that cores is always the limiting resource so we should throw if its not. The only issue with us enforcing this is on cluster managers (standalone and mesos coarse grained) where we don't know the executor cores up front by default. Meaning the spark.executor.cores config defaults to 1 but when the executor is started by default it gets all the cores of the Worker. So we have to add logic specifically to handle that and we can't enforce this requirements, we can just warn when dynamic allocation is enabled for those. ### Why are the changes needed? Bug in dynamic allocation if cores is not limiting resource and warnings not correct. ### Does this PR introduce any user-facing change? no ### How was this patch tested? Unit test added and manually tested the confiditions on local mode, local cluster mode, standalone mode, and yarn. Closes #27138 from tgravescs/SPARK-30446. Authored-by: Thomas Graves Signed-off-by: Thomas Graves --- .../main/scala/org/apache/spark/SparkContext.scala | 39 +- .../scala/org/apache/spark/SparkContextSuite.scala | 22 ++-- .../CoarseGrainedSchedulerBackendSuite.scala | 2 +- 3 files changed, 51 insertions(+), 12 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 94a0ce7..3262631 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2779,9 +2779,13 @@ object SparkContext extends Logging { } else { executorCores.get } + // some cluster managers don't set the EXECUTOR_CORES config by default (standalone + // and mesos coarse grained), so we can't rely on that config for those. + val shouldCheckExecCores = executorCores.isDefined || sc.conf.contains(EXECUTOR_CORES) || +(master.equalsIgnoreCase("yarn") || master.startsWith("k8s")) // Number of cores per executor must meet at least one task requirement. - if (execCores < taskCores) { + if (shouldCheckExecCores && execCores < taskCores) { throw new SparkException(s"The number of cores per executor (=$execCores) has to be >= " + s"the task config: ${CPUS_PER_TASK.key} = $taskCores when run on $master.") } @@ -2789,11 +2793,14 @@ object SparkContext extends Logging { // Calculate the max slots each executor can provide based on resources available on each // executor and resources required by each task. val taskResourceRequirements = parseResourceRequirements(sc.conf, SPARK_TASK_PREFIX) - val executorResourcesAndAmounts = -parseAllResourceRequests(sc.conf, SPARK_EXECUTOR_PREFIX) + val executorResourcesAndAmounts = parseAllResourceRequests(sc.conf, SPARK_EXECUTOR_PREFIX) .map(request => (request.id.resourceName, request.amount)).toMap - var numSlots = execCores / taskCores - var limitingResourceName = "CPU" + + var (numSlots, limitingResourceName) = if (shouldCheckExecCores) { +(execCores / taskCores, "CPU") + } else { +(-1, "") + } taskResourceRequirements.foreach { taskReq => // Make sure the executor resources were specified through config. @@ -2818,12 +2825,28 @@ object SparkContext extends Log
[spark] branch master updated (418f7dc -> d0983af)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 418f7dc [SPARK-30447][SQL] Constant propagation nullability issue add d0983af Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark test" No new revisions were added by this update. Summary of changes: python/pyspark/tests/test_worker.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d0983af -> 2a629e5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d0983af Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark test" add 2a629e5 [SPARK-30234][SQL] ADD FILE cannot add directories from sql CLI No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 ++ .../scala/org/apache/spark/sql/internal/SQLConf.scala | 8 .../spark/sql/execution/command/resources.scala | 3 ++- .../apache/spark/sql/execution/command/DDLSuite.scala | 19 +++ 4 files changed, 31 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bcf07cb -> 418f7dc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bcf07cb [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax add 418f7dc [SPARK-30447][SQL] Constant propagation nullability issue No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/expressions.scala | 41 -- .../optimizer/ConstantPropagationSuite.scala | 25 - .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 9 + 3 files changed, 63 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (afd70a0 -> bcf07cb)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from afd70a0 [SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark test add bcf07cb [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax No new revisions were added by this update. Summary of changes: docs/sql-keywords.md | 1 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 5 +++ .../sql/connector/catalog/SupportsNamespaces.java | 11 -- .../spark/sql/catalyst/parser/AstBuilder.scala | 22 .../sql/catalyst/plans/logical/v2Commands.scala| 10 ++ .../spark/sql/catalyst/parser/DDLParserSuite.scala | 13 +++ .../apache/spark/sql/execution/command/ddl.scala | 2 +- .../datasources/v2/CreateNamespaceExec.scala | 6 +++- .../datasources/v2/DataSourceV2Strategy.scala | 7 +++- .../datasources/v2/DescribeNamespaceExec.scala | 22 .../datasources/v2/V2SessionCatalog.scala | 3 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 33 ++--- .../datasources/v2/V2SessionCatalogSuite.scala | 25 - .../spark/sql/hive/execution/HiveDDLSuite.scala| 42 ++ 14 files changed, 152 insertions(+), 50 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (afd70a0 -> bcf07cb)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from afd70a0 [SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark test add bcf07cb [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax No new revisions were added by this update. Summary of changes: docs/sql-keywords.md | 1 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 5 +++ .../sql/connector/catalog/SupportsNamespaces.java | 11 -- .../spark/sql/catalyst/parser/AstBuilder.scala | 22 .../sql/catalyst/plans/logical/v2Commands.scala| 10 ++ .../spark/sql/catalyst/parser/DDLParserSuite.scala | 13 +++ .../apache/spark/sql/execution/command/ddl.scala | 2 +- .../datasources/v2/CreateNamespaceExec.scala | 6 +++- .../datasources/v2/DataSourceV2Strategy.scala | 7 +++- .../datasources/v2/DescribeNamespaceExec.scala | 22 .../datasources/v2/V2SessionCatalog.scala | 3 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 33 ++--- .../datasources/v2/V2SessionCatalogSuite.scala | 25 - .../spark/sql/hive/execution/HiveDDLSuite.scala| 42 ++ 14 files changed, 152 insertions(+), 50 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org