[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16693


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97709404
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -44,40 +44,6 @@ case class CreateHiveTableAsSelectCommand(
   override def innerChildren: Seq[LogicalPlan] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-lazy val metastoreRelation: MetastoreRelation = {
-  import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-  import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
-  import org.apache.hadoop.io.Text
-  import org.apache.hadoop.mapred.TextInputFormat
-
-  val withFormat =
-tableDesc.withNewStorage(
-  inputFormat =
-
tableDesc.storage.inputFormat.orElse(Some(classOf[TextInputFormat].getName)),
-  outputFormat =
-tableDesc.storage.outputFormat
-  .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, 
Text]].getName)),
-  serde = 
tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)),
-  compressed = tableDesc.storage.compressed)
--- End diff --

Actually, after the code refactoring, this is always ensured in the rule 
`DetermineHiveSerde`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97709347
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -44,40 +44,6 @@ case class CreateHiveTableAsSelectCommand(
   override def innerChildren: Seq[LogicalPlan] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-lazy val metastoreRelation: MetastoreRelation = {
-  import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-  import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
-  import org.apache.hadoop.io.Text
-  import org.apache.hadoop.mapred.TextInputFormat
-
-  val withFormat =
-tableDesc.withNewStorage(
-  inputFormat =
-
tableDesc.storage.inputFormat.orElse(Some(classOf[TextInputFormat].getName)),
-  outputFormat =
-tableDesc.storage.outputFormat
-  .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, 
Text]].getName)),
-  serde = 
tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)),
-  compressed = tableDesc.storage.compressed)
-
-  val withSchema = if (withFormat.schema.isEmpty) {
-tableDesc.copy(schema = query.schema)
-  } else {
-withFormat
--- End diff --

To the other reviewers, this is not needed, because the schema is always 
empty when we need to create a table. See [the assert 
here.](https://github.com/cloud-fan/spark/blob/db00cf9061b2ad4263671f5ca9252642a091ee45/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala#L70).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97708445
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -89,12 +55,30 @@ case class CreateHiveTableAsSelectCommand(
 // Since the table already exists and the save mode is Ignore, we 
will just return.
 return Seq.empty
   }
-  sparkSession.sessionState.executePlan(InsertIntoTable(
-metastoreRelation, Map(), query, overwrite = false, ifNotExists = 
false)).toRdd
--- End diff --

uh... Previously, we try to create the table even if the table still 
exists. A good change!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97705600
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -116,6 +117,11 @@ final class DataStreamReader 
private[sql](sparkSession: SparkSession) extends Lo
* @since 2.0.0
*/
   def load(): DataFrame = {
+if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+  throw new AnalysisException("Hive data source can only be used with 
tables, you can not " +
+"write files of Hive data source directly.")
--- End diff --

This is to read the streaming data from Hive tables, right? I think we need 
to fix the error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16693#discussion_r97705570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -221,6 +222,11 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 2.0.0
*/
   def start(): StreamingQuery = {
+if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+  throw new AnalysisException("Hive data source can only be used with 
tables, you can not " +
+"read files of Hive data source directly.")
--- End diff --

This is not to read but write the results to Hive tables, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...

2017-01-24 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16693

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand

## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/16552 , 
`CreateHiveTableAsSelectCommand` becomes very similar to 
`CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only 
creating table in the table-not-exist branch.

This PR also adds hive provider checking in DataStream reader/writer, which 
is missed in #16552 

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16693


commit db00cf9061b2ad4263671f5ca9252642a091ee45
Author: Wenchen Fan 
Date:   2017-01-24T13:35:03Z

simplify CreateHiveTableAsSelectCommand




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org