[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166865177
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with 
SharedSQLContext with Befo
   checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
 }
   }
+
+  // TODO: This test is copied from HiveDDLSuite, unify it later.
+  test("SPARK-23348: append data to data source table with saveAsTable") {
--- End diff --

maybe we can add this when unifying the test cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166865065
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) 
extends Rule[LogicalPlan] wit
""".stripMargin)
   }
 
-  castAndRenameChildOutput(insert.copy(partition = 
normalizedPartSpec), expectedColumns)
+  insert.copy(query = newQuery, partition = normalizedPartSpec)
--- End diff --

it's also ok to always copy it and the code is neater.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20527


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166853858
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with 
SharedSQLContext with Befo
   checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
 }
   }
+
+  // TODO: This test is copied from HiveDDLSuite, unify it later.
+  test("SPARK-23348: append data to data source table with saveAsTable") {
--- End diff --

Do we also want to cover the following case:
```
2) Target tables have column metadata
```?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166852743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) 
extends Rule[LogicalPlan] wit
""".stripMargin)
   }
 
-  castAndRenameChildOutput(insert.copy(partition = 
normalizedPartSpec), expectedColumns)
+  insert.copy(query = newQuery, partition = normalizedPartSpec)
--- End diff --

nit: don't need to copy the `newQuery` if it is the same as `query`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166770787
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with 
SharedSQLContext with Befo
   checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
 }
   }
+
+  // TODO: This test is copied from HiveDDLSuite, unify it later.
+  test("SPARK-23348: append data to data source table with saveAsTable") {
+withTable("t", "t1") {
+  Seq(1 -> "a").toDF("i", "j").write.saveAsTable("t")
+  checkAnswer(spark.table("t"), Row(1, "a"))
+
+  sql("INSERT INTO t SELECT 2, 'b'")
+  checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Nil)
+
+  Seq(3 -> "c").toDF("i", "j").write.mode("append").saveAsTable("t")
+  checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, 
"c") :: Nil)
+
+  Seq("c" -> 3).toDF("i", "j").write.mode("append").saveAsTable("t")
+  checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, 
"c")
+:: Row(null, "3") :: Nil)
--- End diff --

Thank you for pining me, @cloud-fan . +1 for the patch, LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-07 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20527

[SPARK-23348][SQL] append data using saveAsTable should adjust the data 
types

## What changes were proposed in this pull request?

For inserting/appending data to an existing table, Spark should adjust the 
data types of the input query according to the table schema, or fail fast if 
it's uncastable.

There are several ways to insert/append data: SQL API, 
`DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways 
create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. 
However, we only adjust input query data types for `InsertIntoTable`, and users 
may hit weird errors when appending data using `saveAsTable`. See the JIRA for 
the error case.

This PR fixes this bug by adjusting data types for `CreateTable` too.

## How was this patch tested?

new test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark saveAsTable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20527


commit ad19125ab54439979eb4aa2bebf2cb8c9c85551e
Author: Wenchen Fan 
Date:   2018-02-07T08:34:40Z

append data using saveAsTable should adjust the data types




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org