[spark] branch branch-2.4 updated: [SPARK-30489][BUILD] Make build delete pyspark.zip file properly

2020-01-10 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 3b029d9  [SPARK-30489][BUILD] Make build delete pyspark.zip file 
properly
3b029d9 is described below

commit 3b029d911d56f3071e348a7a2c6e1b285143e9fc
Author: Jeff Evans 
AuthorDate: Fri Jan 10 16:59:51 2020 -0800

[SPARK-30489][BUILD] Make build delete pyspark.zip file properly

### What changes were proposed in this pull request?

A small fix to the Maven build file under the `assembly` module by switch 
"dir" attribute to "file".

### Why are the changes needed?

To make the `` task properly delete an existing zip file.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran a build with the change and confirmed that a corrupted zip file was 
replaced with the correct one.

Closes #27171 from jeff303/SPARK-30489.

Authored-by: Jeff Evans 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 582509b7ae76bc298c31a68bcfd7011c1b9e23a7)
Signed-off-by: Dongjoon Hyun 
---
 assembly/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 432a388..a7d0f0e 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -117,7 +117,7 @@
   
   
 
-  
+  
   
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f372d1c -> 582509b)

2020-01-10 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f372d1c  [SPARK-29748][PYTHON][SQL] Remove Row field sorting in 
PySpark for version 3.6+
 add 582509b  [SPARK-30489][BUILD] Make build delete pyspark.zip file 
properly

No new revisions were added by this update.

Summary of changes:
 assembly/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b5bc3e1 -> f372d1c)

2020-01-10 Thread cutlerb
This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b5bc3e1  [SPARK-30312][SQL] Preserve path permission and acl when 
truncate table
 add f372d1c  [SPARK-29748][PYTHON][SQL] Remove Row field sorting in 
PySpark for version 3.6+

No new revisions were added by this update.

Summary of changes:
 docs/pyspark-migration-guide.md|  2 ++
 python/pyspark/sql/tests/test_types.py | 13 
 python/pyspark/sql/types.py| 56 +++---
 python/run-tests.py|  3 +-
 4 files changed, 62 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7fb17f59 -> b5bc3e1)

2020-01-10 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7fb17f59 [SPARK-29779][CORE] Compact old event log files and cleanup
 add b5bc3e1  [SPARK-30312][SQL] Preserve path permission and acl when 
truncate table

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala| 11 +++
 .../spark/sql/execution/command/tables.scala   | 47 +
 .../spark/sql/execution/command/DDLSuite.scala | 79 +-
 3 files changed, 136 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated (6ac3659 -> 0a5757e)

2020-01-10 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6ac3659  [SPARK-30410][SQL][2.4] Calculating size of table with large 
number of partitions causes flooding logs
 add 0a5757e  [SPARK-30447][SQL][2.4] Constant propagation nullability issue

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/expressions.scala | 41 --
 .../optimizer/ConstantPropagationSuite.scala   | 25 -
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  9 +
 3 files changed, 63 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2bd8731 -> 7fb17f59)

2020-01-10 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2bd8731  [SPARK-30468][SQL] Use multiple lines to display data columns 
for show create table command
 add 7fb17f59 [SPARK-29779][CORE] Compact old event log files and cleanup

No new revisions were added by this update.

Summary of changes:
 apache.spark.deploy.history.EventFilterBuilder |   1 +
 .../deploy/history/BasicEventFilterBuilder.scala   | 176 +++
 .../apache/spark/deploy/history/EventFilter.scala  | 109 +++
 .../deploy/history/EventLogFileCompactor.scala | 224 ++
 .../spark/deploy/history/EventLogFileReaders.scala |  28 +-
 .../spark/deploy/history/EventLogFileWriters.scala |  28 +-
 .../org/apache/spark/internal/config/package.scala |  18 ++
 .../history/BasicEventFilterBuilderSuite.scala | 228 ++
 .../deploy/history/BasicEventFilterSuite.scala | 208 +
 .../history/EventLogFileCompactorSuite.scala   | 326 +
 .../deploy/history/EventLogFileReadersSuite.scala  |   6 +-
 .../deploy/history/EventLogFileWritersSuite.scala  |   4 +-
 .../spark/deploy/history/EventLogTestHelper.scala  |  55 +++-
 .../spark/status/AppStatusListenerSuite.scala  |  38 +--
 .../spark/status/ListenerEventsTestHelper.scala| 154 ++
 15 files changed, 1545 insertions(+), 58 deletions(-)
 create mode 100644 
core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/BasicEventFilterBuilder.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterBuilderSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/status/ListenerEventsTestHelper.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b942832 -> 2bd8731)

2020-01-10 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b942832  [SPARK-30343][SQL] Skip unnecessary checks in 
RewriteDistinctAggregates
 add 2bd8731  [SPARK-30468][SQL] Use multiple lines to display data columns 
for show create table command

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/command/tables.scala   | 22 +++
 .../sql-tests/results/show-create-table.sql.out| 67 --
 .../apache/spark/sql/ShowCreateTableSuite.scala| 15 +++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  6 +-
 4 files changed, 72 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b942832 -> 2bd8731)

2020-01-10 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b942832  [SPARK-30343][SQL] Skip unnecessary checks in 
RewriteDistinctAggregates
 add 2bd8731  [SPARK-30468][SQL] Use multiple lines to display data columns 
for show create table command

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/command/tables.scala   | 22 +++
 .../sql-tests/results/show-create-table.sql.out| 67 --
 .../apache/spark/sql/ShowCreateTableSuite.scala| 15 +++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  6 +-
 4 files changed, 72 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d6532c7 -> b942832)

2020-01-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d6532c7  [SPARK-30448][CORE] accelerator aware scheduling enforce 
cores as limiting resource
 add b942832  [SPARK-30343][SQL] Skip unnecessary checks in 
RewriteDistinctAggregates

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/RewriteDistinctAggregates.scala   | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting resource

2020-01-10 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d6532c7  [SPARK-30448][CORE] accelerator aware scheduling enforce 
cores as limiting resource
d6532c7 is described below

commit d6532c7079f22f32e90e1c69c25bdfab51c7c53e
Author: Thomas Graves 
AuthorDate: Fri Jan 10 08:32:28 2020 -0600

[SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting 
resource

### What changes were proposed in this pull request?

This PR is to make sure cores is the limiting resource when using 
accelerator aware scheduling and fix a few issues with 
SparkContext.checkResourcesPerTask

For the first version of accelerator aware scheduling(SPARK-27495), the 
SPIP had a condition that we can support dynamic allocation because we were 
going to have a strict requirement that we don't waste any resources. This 
means that the number of slots each executor has could be calculated from the 
number of cores and task cpus just as is done today.

Somewhere along the line of development we relaxed that and only warn when 
we are wasting resources. This breaks the dynamic allocation logic if the 
limiting resource is no longer the cores because its using the cores and task 
cpus to calculate the number of executors it needs.  This means we will request 
less executors then we really need to run everything. We have to enforce that 
cores is always the limiting resource so we should throw if its not.

The only issue with us enforcing this is on cluster managers (standalone 
and mesos coarse grained) where we don't know the executor cores up front by 
default. Meaning the spark.executor.cores config defaults to 1 but when the 
executor is started by default it gets all the cores of the Worker. So we have 
to add logic specifically to handle that and we can't enforce this 
requirements, we can just warn when dynamic allocation is enabled for those.

### Why are the changes needed?

Bug in dynamic allocation if cores is not limiting resource and warnings 
not correct.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Unit test added and manually tested the confiditions on local mode, local 
cluster mode, standalone mode, and yarn.

Closes #27138 from tgravescs/SPARK-30446.

Authored-by: Thomas Graves 
Signed-off-by: Thomas Graves 
---
 .../main/scala/org/apache/spark/SparkContext.scala | 39 +-
 .../scala/org/apache/spark/SparkContextSuite.scala | 22 ++--
 .../CoarseGrainedSchedulerBackendSuite.scala   |  2 +-
 3 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 94a0ce7..3262631 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2779,9 +2779,13 @@ object SparkContext extends Logging {
   } else {
 executorCores.get
   }
+  // some cluster managers don't set the EXECUTOR_CORES config by default 
(standalone
+  // and mesos coarse grained), so we can't rely on that config for those.
+  val shouldCheckExecCores = executorCores.isDefined || 
sc.conf.contains(EXECUTOR_CORES) ||
+(master.equalsIgnoreCase("yarn") || master.startsWith("k8s"))
 
   // Number of cores per executor must meet at least one task requirement.
-  if (execCores < taskCores) {
+  if (shouldCheckExecCores && execCores < taskCores) {
 throw new SparkException(s"The number of cores per executor 
(=$execCores) has to be >= " +
   s"the task config: ${CPUS_PER_TASK.key} = $taskCores when run on 
$master.")
   }
@@ -2789,11 +2793,14 @@ object SparkContext extends Logging {
   // Calculate the max slots each executor can provide based on resources 
available on each
   // executor and resources required by each task.
   val taskResourceRequirements = parseResourceRequirements(sc.conf, 
SPARK_TASK_PREFIX)
-  val executorResourcesAndAmounts =
-parseAllResourceRequests(sc.conf, SPARK_EXECUTOR_PREFIX)
+  val executorResourcesAndAmounts = parseAllResourceRequests(sc.conf, 
SPARK_EXECUTOR_PREFIX)
   .map(request => (request.id.resourceName, request.amount)).toMap
-  var numSlots = execCores / taskCores
-  var limitingResourceName = "CPU"
+
+  var (numSlots, limitingResourceName) = if (shouldCheckExecCores) {
+(execCores / taskCores, "CPU")
+  } else {
+(-1, "")
+  }
 
   taskResourceRequirements.foreach { taskReq =>
 // Make sure the executor resources were specified through config.
@@ -2818,12 +2825,28 @@ object SparkContext extends Log

[spark] branch master updated (418f7dc -> d0983af)

2020-01-10 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 418f7dc  [SPARK-30447][SQL] Constant propagation nullability issue
 add d0983af  Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' 
on pyspark test"

No new revisions were added by this update.

Summary of changes:
 python/pyspark/tests/test_worker.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d0983af -> 2a629e5)

2020-01-10 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d0983af  Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' 
on pyspark test"
 add 2a629e5  [SPARK-30234][SQL] ADD FILE cannot add directories from sql 
CLI

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md   |  2 ++
 .../scala/org/apache/spark/sql/internal/SQLConf.scala |  8 
 .../spark/sql/execution/command/resources.scala   |  3 ++-
 .../apache/spark/sql/execution/command/DDLSuite.scala | 19 +++
 4 files changed, 31 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bcf07cb -> 418f7dc)

2020-01-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bcf07cb  [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
 add 418f7dc  [SPARK-30447][SQL] Constant propagation nullability issue

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/expressions.scala | 41 --
 .../optimizer/ConstantPropagationSuite.scala   | 25 -
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  9 +
 3 files changed, 63 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (afd70a0 -> bcf07cb)

2020-01-10 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from afd70a0  [SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on 
pyspark test
 add bcf07cb  [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

No new revisions were added by this update.

Summary of changes:
 docs/sql-keywords.md   |  1 +
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  5 +++
 .../sql/connector/catalog/SupportsNamespaces.java  | 11 --
 .../spark/sql/catalyst/parser/AstBuilder.scala | 22 
 .../sql/catalyst/plans/logical/v2Commands.scala| 10 ++
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 13 +++
 .../apache/spark/sql/execution/command/ddl.scala   |  2 +-
 .../datasources/v2/CreateNamespaceExec.scala   |  6 +++-
 .../datasources/v2/DataSourceV2Strategy.scala  |  7 +++-
 .../datasources/v2/DescribeNamespaceExec.scala | 22 
 .../datasources/v2/V2SessionCatalog.scala  |  3 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 33 ++---
 .../datasources/v2/V2SessionCatalogSuite.scala | 25 -
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 42 ++
 14 files changed, 152 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (afd70a0 -> bcf07cb)

2020-01-10 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from afd70a0  [SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on 
pyspark test
 add bcf07cb  [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

No new revisions were added by this update.

Summary of changes:
 docs/sql-keywords.md   |  1 +
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  5 +++
 .../sql/connector/catalog/SupportsNamespaces.java  | 11 --
 .../spark/sql/catalyst/parser/AstBuilder.scala | 22 
 .../sql/catalyst/plans/logical/v2Commands.scala| 10 ++
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 13 +++
 .../apache/spark/sql/execution/command/ddl.scala   |  2 +-
 .../datasources/v2/CreateNamespaceExec.scala   |  6 +++-
 .../datasources/v2/DataSourceV2Strategy.scala  |  7 +++-
 .../datasources/v2/DescribeNamespaceExec.scala | 22 
 .../datasources/v2/V2SessionCatalog.scala  |  3 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 33 ++---
 .../datasources/v2/V2SessionCatalogSuite.scala | 25 -
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 42 ++
 14 files changed, 152 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org