[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-11-23 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/14612


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-11-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14612#discussion_r87958491
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -89,6 +89,22 @@ case class AnalyzeCreateTable(sparkSession: 
SparkSession) extends Rule[LogicalPl
   }
   c
 
+case c @ CreateTable(tableDesc, mode, Some(query))
+if mode == SaveMode.Append && 
isHiveSerdeTable(tableDesc.identifier) =>
--- End diff --

Let me try another way to fix it. Will submit a new PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-11-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14612#discussion_r87958997
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -89,6 +89,22 @@ case class AnalyzeCreateTable(sparkSession: 
SparkSession) extends Rule[LogicalPl
   }
   c
 
+case c @ CreateTable(tableDesc, mode, Some(query))
+if mode == SaveMode.Append && 
isHiveSerdeTable(tableDesc.identifier) =>
--- End diff --

uh... Actually, I found a bug in our write path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-11-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14612#discussion_r87787158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -89,6 +89,22 @@ case class AnalyzeCreateTable(sparkSession: 
SparkSession) extends Rule[LogicalPl
   }
   c
 
+case c @ CreateTable(tableDesc, mode, Some(query))
+if mode == SaveMode.Append && 
isHiveSerdeTable(tableDesc.identifier) =>
--- End diff --

can we consolidate hive and data source table here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-11-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14612#discussion_r87330702
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -88,6 +88,21 @@ case class AnalyzeCreateTable(sparkSession: 
SparkSession) extends Rule[LogicalPl
   }
   c
 
+// CTAS from DataFrameWriter saveAsTable API
+// Handling the case when the source table is Hive serde tables.
+case c @ CreateTable(tableDesc, mode, Some(query))
+if mode == SaveMode.Append && 
isHiveSerdeTable(tableDesc.identifier) =>
+  val catalog = sparkSession.sessionState.catalog
+  catalog.lookupRelation(tableDesc.identifier) match {
--- End diff --

is it possible that we can extract the similar logic in 
`CreateDataSourceTableAsSelectCommand` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-12 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14612

[SPARK-16803] [SQL] SaveAsTable does not work when source DataFrame is 
built on a Hive Table

### What changes were proposed in this pull request?
In Spark 2.0, `SaveAsTable` does not work when source DataFrame is built on 
a Hive Table, but Spark 1.6 works.

**Spark 1.6**
```Scala
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 
as key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-+
|key|value|
+---+-+
|  1|  abc|
|  1|  abc|
+---+-+
```
**Spark 2.0**
```Scala
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
sample, sample
 is not supported.;
```

This PR is to provide a support with by-name resolution. In 1.6, it is 
by-position resolution. The previous behavior is wrong. We need to adjust the 
order.

### How was this patch tested?
Test cases are added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark saveAsTableFix2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14612


commit 1967fcbeb6c7934e825cb9c043e6aa04bb057fc1
Author: gatorsmile 
Date:   2016-08-11T23:04:43Z

fix.

commit 1d3d392f2c6bbd358b55f3bde12806e2979dfa01
Author: gatorsmile 
Date:   2016-08-11T23:08:57Z

improve the comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14612: [SPARK-16803] [SQL] SaveAsTable does not work whe...

2016-08-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14612#discussion_r74520972
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1473,6 +1473,94 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("saveAsTable(CTAS) when the source DataFrame is built on a Hive 
table") {
+val tableName = "tab1"
+withTable(tableName) {
+  sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
+
+  val df = sql(s"select key, value from $tableName")
+  df.write.mode(SaveMode.Append).saveAsTable(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(1, "abc") :: Row(1, "abc") :: Nil)
+
+  // out of order
+  val df1 = sql(s"select value, key from $tableName")
+  df1.write.mode(SaveMode.Append).saveAsTable(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(1, "abc") :: Row(1, "abc") :: Row(1, "abc") :: Row(1, "abc") 
:: Nil)
+
+  // super set
+  val df2 = sql(s"SELECT value, 'ccc' AS uselessColumn, key FROM 
$tableName LIMIT 1")
+  df2.write.mode(SaveMode.Append).saveAsTable(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(1, "abc") :: Row(1, "abc") :: Row(1, "abc") :: Row(1, "abc") 
:: Row(1, "abc") :: Nil)
+
+  // the schema of dataFrame is a subset of destination table schema
+  val df3 = sql(s"SELECT value, key AS non_existent FROM $tableName")
+  val e = intercept[AnalysisException] {
+df3.write.mode(SaveMode.Append).saveAsTable(tableName)
+  }.getMessage
+  assert(e.contains("Unable to resolve key given [value, 
non_existent]"))
+}
+  }
+
+  test("saveAsTable(CTAS) and insertInto when the source DataFrame is 
built on Data Source") {
+val tableName = "tab1"
+withTable(tableName) {
+  val schema = StructType(
+StructField("key", IntegerType, nullable = false) ::
+  StructField("value", IntegerType, nullable = true) :: Nil)
+  val row = Row(3, 4)
+  val df = spark.createDataFrame(sparkContext.parallelize(row :: Nil), 
schema)
+
+  
df.write.format("json").mode(SaveMode.Overwrite).saveAsTable(tableName)
+  df.write.format("json").mode(SaveMode.Append).saveAsTable(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(3, 4) :: Row(3, 4) :: Nil
+  )
+
+  (1 to 2).map { i => (i, i) }.toDF("key", 
"value").write.insertInto(tableName)
+  checkAnswer(
+sql(s"SELECT key, value FROM $tableName"),
+Row(1, 1) :: Row(2, 2) :: Row(3, 4) :: Row(3, 4) :: Nil
+  )
+}
+  }
+
+  test("insertInto when the source DataFrame is built on a Hive table") {
+val tableName = "tab1"
+withTable(tableName) {
+  sql(s"CREATE TABLE $tableName stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
+  val df = sql(s"SELECT key, value AS value FROM $tableName")
+
+  df.write.insertInto(tableName)
+  checkAnswer(
+sql(s"SELECT * FROM $tableName"),
+Row(1, "abc") :: Row(1, "abc") :: Nil
+  )
+}
+  }
+
+  test("saveAsTable - format Hive") {
--- End diff --

@cloud-fan Should we disable the format `hive` in DataFrameWriter? If this 
is supported, we might need a separate PR to cover all the modes. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org