[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166865177 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil) } } + + // TODO: This test is copied from HiveDDLSuite, unify it later. + test("SPARK-23348: append data to data source table with saveAsTable") { --- End diff -- maybe we can add this when unifying the test cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166865065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit """.stripMargin) } - castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns) + insert.copy(query = newQuery, partition = normalizedPartSpec) --- End diff -- it's also ok to always copy it and the code is neater. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20527 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166853858 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil) } } + + // TODO: This test is copied from HiveDDLSuite, unify it later. + test("SPARK-23348: append data to data source table with saveAsTable") { --- End diff -- Do we also want to cover the following case: ``` 2) Target tables have column metadata ```? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166852743 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit """.stripMargin) } - castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns) + insert.copy(query = newQuery, partition = normalizedPartSpec) --- End diff -- nit: don't need to copy the `newQuery` if it is the same as `query`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166770787 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil) } } + + // TODO: This test is copied from HiveDDLSuite, unify it later. + test("SPARK-23348: append data to data source table with saveAsTable") { +withTable("t", "t1") { + Seq(1 -> "a").toDF("i", "j").write.saveAsTable("t") + checkAnswer(spark.table("t"), Row(1, "a")) + + sql("INSERT INTO t SELECT 2, 'b'") + checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Nil) + + Seq(3 -> "c").toDF("i", "j").write.mode("append").saveAsTable("t") + checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, "c") :: Nil) + + Seq("c" -> 3).toDF("i", "j").write.mode("append").saveAsTable("t") + checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, "c") +:: Row(null, "3") :: Nil) --- End diff -- Thank you for pining me, @cloud-fan . +1 for the patch, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20527 [SPARK-23348][SQL] append data using saveAsTable should adjust the data types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable. There are several ways to insert/append data: SQL API, `DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. However, we only adjust input query data types for `InsertIntoTable`, and users may hit weird errors when appending data using `saveAsTable`. See the JIRA for the error case. This PR fixes this bug by adjusting data types for `CreateTable` too. ## How was this patch tested? new test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark saveAsTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20527.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20527 commit ad19125ab54439979eb4aa2bebf2cb8c9c85551e Author: Wenchen Fan Date: 2018-02-07T08:34:40Z append data using saveAsTable should adjust the data types --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org