[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21818


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r204969835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1335,7 +1335,9 @@ object SQLConf {
 "overwriting. In dynamic mode, Spark doesn't delete partitions 
ahead, and only overwrite " +
 "those partitions that have data written into it at runtime. By 
default we use static " +
 "mode to keep the same behavior of Spark prior to 2.3. Note that 
this config doesn't " +
-"affect Hive serde tables, as they are always overwritten with 
dynamic mode.")
+"affect Hive serde tables, as they are always overwritten with 
dynamic mode. This can " +
+"also be set as an output option for a data source using key 
partitionOverwriteMode, " +
--- End diff --

Also need to explain the precedence between the option and sqlconf. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r203934987
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -486,6 +486,25 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   }
 }
   }
+  test("SPARK-24860: dynamic partition overwrite specified per source 
without catalog table") {
+withTempPath { path =>
+  Seq((1, 1, 1)).toDF("i", "part1", "part2")
--- End diff --

can we simplify the test? ideally we only need one partition column, and 
write some initial data to the table. then do an overwrite with 
partitionOverwriteMode=dynamic, and another overwrite with 
partitionOverwriteMode=static.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r203934743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -91,8 +92,12 @@ case class InsertIntoHadoopFsRelationCommand(
 
 val pathExists = fs.exists(qualifiedOutputPath)
 
-val enableDynamicOverwrite =
-  sparkSession.sessionState.conf.partitionOverwriteMode == 
PartitionOverwriteMode.DYNAMIC
+val parameters = CaseInsensitiveMap(options)
+
+val enableDynamicOverwrite = parameters.get("partitionOverwriteMode")
+  .map(mode => PartitionOverwriteMode.withName(mode.toUpperCase))
+  .getOrElse(sparkSession.sessionState.conf.partitionOverwriteMode) ==
+  PartitionOverwriteMode.DYNAMIC
--- End diff --

nit: this is too long
```
val partitionOverwriteMode = parameters.get("partitionOverwriteMode")
val enableDynamicOverwrite = partitionOverwriteMode == 
PartitionOverwriteMode.DYNAMIC
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r203934766
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -486,6 +486,25 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   }
 }
   }
+  test("SPARK-24860: dynamic partition overwrite specified per source 
without catalog table") {
--- End diff --

nit: add a blank file before this new test case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread koertkuipers
GitHub user koertkuipers opened a pull request:

https://github.com/apache/spark/pull/21818

[SPARK-24860][SQL] Support setting of partitionOverWriteMode in output 
options for writing DataFrame

## What changes were proposed in this pull request?

Besides spark setting spark.sql.sources.partitionOverwriteMode also allow 
setting partitionOverWriteMode per write 

## How was this patch tested?

Added unit test in InsertSuite

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tresata-opensource/spark 
feat-partition-overwrite-mode-per-write

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21818


commit 7dd2eabfd4f5ca18354df85f5bf5285e3e23359d
Author: Koert Kuipers 
Date:   2018-07-19T17:42:15Z

support setting of partitionOverWriteMode in output options for writing 
DataFrame




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org