[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79979332
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

@viirya @cloud-fan Actually i am not sure, if the above comment is in sync 
with the code. When we had this comment, we used to have 
CreateTableAsSelectLogicalPlan to represent the CTAS case and we used to check 
for serde's presence to determine whether or not to convert it to a data source 
table like following.

``` SQL
   if (sessionState.convertCTAS && table.storage.serde.isEmpty) {
  // Do the conversion when spark.sql.hive.convertCTAS is true and 
the query
  // does not specify any storage format (file format and storage 
handler).
  if (table.identifier.database.isDefined) {
throw new AnalysisException(
  "Cannot specify database name in a CTAS statement " +
"when spark.sql.hive.convertCTAS is set to true.")
  }

  val mode = if (allowExisting) SaveMode.Ignore else 
SaveMode.ErrorIfExists
  CreateTableUsingAsSelect(
TableIdentifier(desc.identifier.table),
conf.defaultDataSourceName,
temporary = false,
Array.empty[String],
bucketSpec = None,
mode,
options = Map.empty[String, String],
child
  )
} else {
  val desc = if (table.storage.serde.isEmpty) {
// add default serde
table.withNewStorage(
  serde = 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
  } else {
table
  }
```
I think this code has changed and moved to SparkSqlParser ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79978495
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

The current checking conditions are based on [ctx.createFileFormat and 
ctx.rowFormat](https://github.com/dilipbiswal/spark/blob/f2b93de629f378ca99f8d3086ade8dc05b41a912/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L1051-L1052).
 Thus, I think this PR looks ok. : )




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79978157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

The comment is not valid now. This was removed by the PR: 
https://github.com/apache/spark/pull/13386


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79977535
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

cc @yhuai to confirm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79976580
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

I think this is kept as unspecified because it is intended to write the 
table with Hive write path. If we specify serde here, it will be converted to 
datasource table. Is it ok? cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org