Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/2570#discussion_r18280936
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
---
@@ -30,25 +32,38 @@ import org.apache.spark.sql.hive.MetastoreRelation
* Create table and insert the query result into it.
* @param database the database name of the new relation
* @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the
`InsertIntoHiveTable`
- * by specifying the `MetaStoreRelation`, the data will be inserted
into that table.
- * TODO Add more table creating properties, e.g. SerDe, StorageHandler,
in-memory cache etc.
+ * @param allowExisting allow continue working if it's already exists,
otherwise
+ * raise exception
+ * @param extra the extra information for this Operator, it should be the
+ * ASTNode object for extracting the CreateTableDesc.
+ * @param query the query whose result will be insert into the new relation
*/
@Experimental
case class CreateTableAsSelect(
database: String,
tableName: String,
- query: SparkPlan,
- insertIntoRelation: MetastoreRelation => InsertIntoHiveTable)
+ allowExisting: Boolean,
+ extra: AnyRef,
+ query: LogicalPlan)
extends LeafNode with Command {
def output = Seq.empty
+ private[this] def sc = sqlContext.asInstanceOf[HiveContext]
+
// A lazy computing of the metastoreRelation
private[this] lazy val metastoreRelation: MetastoreRelation = {
- // Create the table
- val sc = sqlContext.asInstanceOf[HiveContext]
- sc.catalog.createTable(database, tableName, query.output, false)
+ // Get the CreateTableDesc from Hive SemanticAnalyzer
+ val sa = new SemanticAnalyzer(sc.hiveconf)
--- End diff --
On one hand, I think we should try to not interact with Hive's query
compiler if possible. On the other hand, since we ask Hive to process create
table statements, it will be good to also ask Hive to process the create table
part in CTAS queries. I guess a cleaner approach (requiring more work) will be
splitting a CTAS query to create table part and query part. We ask Hive to
process the create table part (Hive will see this part as a create table
statement). We take care the query part. In this case, we will not need to
duplicate code of `DDLTask.createTable()`.
For now, I think that using `SemanticAnalyzer` is fine.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]