[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21818 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21818#discussion_r204969835 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1335,7 +1335,9 @@ object SQLConf { "overwriting. In dynamic mode, Spark doesn't delete partitions ahead, and only overwrite " + "those partitions that have data written into it at runtime. By default we use static " + "mode to keep the same behavior of Spark prior to 2.3. Note that this config doesn't " + -"affect Hive serde tables, as they are always overwritten with dynamic mode.") +"affect Hive serde tables, as they are always overwritten with dynamic mode. This can " + +"also be set as an output option for a data source using key partitionOverwriteMode, " + --- End diff -- Also need to explain the precedence between the option and sqlconf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21818#discussion_r203934987 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala --- @@ -486,6 +486,25 @@ class InsertSuite extends DataSourceTest with SharedSQLContext { } } } + test("SPARK-24860: dynamic partition overwrite specified per source without catalog table") { +withTempPath { path => + Seq((1, 1, 1)).toDF("i", "part1", "part2") --- End diff -- can we simplify the test? ideally we only need one partition column, and write some initial data to the table. then do an overwrite with partitionOverwriteMode=dynamic, and another overwrite with partitionOverwriteMode=static. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21818#discussion_r203934743 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -91,8 +92,12 @@ case class InsertIntoHadoopFsRelationCommand( val pathExists = fs.exists(qualifiedOutputPath) -val enableDynamicOverwrite = - sparkSession.sessionState.conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC +val parameters = CaseInsensitiveMap(options) + +val enableDynamicOverwrite = parameters.get("partitionOverwriteMode") + .map(mode => PartitionOverwriteMode.withName(mode.toUpperCase)) + .getOrElse(sparkSession.sessionState.conf.partitionOverwriteMode) == + PartitionOverwriteMode.DYNAMIC --- End diff -- nit: this is too long ``` val partitionOverwriteMode = parameters.get("partitionOverwriteMode") val enableDynamicOverwrite = partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21818#discussion_r203934766 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala --- @@ -486,6 +486,25 @@ class InsertSuite extends DataSourceTest with SharedSQLContext { } } } + test("SPARK-24860: dynamic partition overwrite specified per source without catalog table") { --- End diff -- nit: add a blank file before this new test case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
GitHub user koertkuipers opened a pull request: https://github.com/apache/spark/pull/21818 [SPARK-24860][SQL] Support setting of partitionOverWriteMode in output options for writing DataFrame ## What changes were proposed in this pull request? Besides spark setting spark.sql.sources.partitionOverwriteMode also allow setting partitionOverWriteMode per write ## How was this patch tested? Added unit test in InsertSuite Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tresata-opensource/spark feat-partition-overwrite-mode-per-write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21818.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21818 commit 7dd2eabfd4f5ca18354df85f5bf5285e3e23359d Author: Koert Kuipers Date: 2018-07-19T17:42:15Z support setting of partitionOverWriteMode in output options for writing DataFrame --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org