[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16693 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16693#discussion_r97709404 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -44,40 +44,6 @@ case class CreateHiveTableAsSelectCommand( override def innerChildren: Seq[LogicalPlan] = Seq(query) override def run(sparkSession: SparkSession): Seq[Row] = { -lazy val metastoreRelation: MetastoreRelation = { - import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat - import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe - import org.apache.hadoop.io.Text - import org.apache.hadoop.mapred.TextInputFormat - - val withFormat = -tableDesc.withNewStorage( - inputFormat = - tableDesc.storage.inputFormat.orElse(Some(classOf[TextInputFormat].getName)), - outputFormat = -tableDesc.storage.outputFormat - .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, Text]].getName)), - serde = tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)), - compressed = tableDesc.storage.compressed) --- End diff -- Actually, after the code refactoring, this is always ensured in the rule `DetermineHiveSerde`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16693#discussion_r97709347 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -44,40 +44,6 @@ case class CreateHiveTableAsSelectCommand( override def innerChildren: Seq[LogicalPlan] = Seq(query) override def run(sparkSession: SparkSession): Seq[Row] = { -lazy val metastoreRelation: MetastoreRelation = { - import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat - import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe - import org.apache.hadoop.io.Text - import org.apache.hadoop.mapred.TextInputFormat - - val withFormat = -tableDesc.withNewStorage( - inputFormat = - tableDesc.storage.inputFormat.orElse(Some(classOf[TextInputFormat].getName)), - outputFormat = -tableDesc.storage.outputFormat - .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, Text]].getName)), - serde = tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)), - compressed = tableDesc.storage.compressed) - - val withSchema = if (withFormat.schema.isEmpty) { -tableDesc.copy(schema = query.schema) - } else { -withFormat --- End diff -- To the other reviewers, this is not needed, because the schema is always empty when we need to create a table. See [the assert here.](https://github.com/cloud-fan/spark/blob/db00cf9061b2ad4263671f5ca9252642a091ee45/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala#L70). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16693#discussion_r97708445 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -89,12 +55,30 @@ case class CreateHiveTableAsSelectCommand( // Since the table already exists and the save mode is Ignore, we will just return. return Seq.empty } - sparkSession.sessionState.executePlan(InsertIntoTable( -metastoreRelation, Map(), query, overwrite = false, ifNotExists = false)).toRdd --- End diff -- uh... Previously, we try to create the table even if the table still exists. A good change! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16693#discussion_r97705600 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -116,6 +117,11 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo * @since 2.0.0 */ def load(): DataFrame = { +if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) { + throw new AnalysisException("Hive data source can only be used with tables, you can not " + +"write files of Hive data source directly.") --- End diff -- This is to read the streaming data from Hive tables, right? I think we need to fix the error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16693#discussion_r97705570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -221,6 +222,11 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 2.0.0 */ def start(): StreamingQuery = { +if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) { + throw new AnalysisException("Hive data source can only be used with tables, you can not " + +"read files of Hive data source directly.") --- End diff -- This is not to read but write the results to Hive tables, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/16693 [SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand ## What changes were proposed in this pull request? After https://github.com/apache/spark/pull/16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in #16552 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16693.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16693 commit db00cf9061b2ad4263671f5ca9252642a091ee45 Author: Wenchen Fan Date: 2017-01-24T13:35:03Z simplify CreateHiveTableAsSelectCommand --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org