Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2570#discussion_r18280936
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
    @@ -30,25 +32,38 @@ import org.apache.spark.sql.hive.MetastoreRelation
      * Create table and insert the query result into it.
      * @param database the database name of the new relation
      * @param tableName the table name of the new relation
    - * @param insertIntoRelation function of creating the 
`InsertIntoHiveTable` 
    - *        by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
    - * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
    + * @param allowExisting allow continue working if it's already exists, 
otherwise
    + *                      raise exception
    + * @param extra the extra information for this Operator, it should be the
    + *              ASTNode object for extracting the CreateTableDesc.
    + * @param query the query whose result will be insert into the new relation
      */
     @Experimental
     case class CreateTableAsSelect(
       database: String,
       tableName: String,
    -  query: SparkPlan,
    -  insertIntoRelation: MetastoreRelation => InsertIntoHiveTable)
    +  allowExisting: Boolean,
    +  extra: AnyRef,
    +  query: LogicalPlan)
         extends LeafNode with Command {
     
       def output = Seq.empty
     
    +  private[this] def sc = sqlContext.asInstanceOf[HiveContext]
    +
       // A lazy computing of the metastoreRelation
       private[this] lazy val metastoreRelation: MetastoreRelation = {
    -    // Create the table 
    -    val sc = sqlContext.asInstanceOf[HiveContext]
    -    sc.catalog.createTable(database, tableName, query.output, false)
    +    // Get the CreateTableDesc from Hive SemanticAnalyzer
    +    val sa = new SemanticAnalyzer(sc.hiveconf)
    --- End diff --
    
    On one hand, I think we should try to not interact with Hive's query 
compiler if possible. On the other hand, since we ask Hive to process create 
table statements, it will be good to also ask Hive to process the create table 
part in CTAS queries. I guess a cleaner approach (requiring more work) will be 
splitting a CTAS query to create table part and query part. We ask Hive to 
process the create table part (Hive will see this part as a create table 
statement). We take care the query part. In this case, we will not need to 
duplicate code of `DDLTask.createTable()`. 
    
    For now, I think that using `SemanticAnalyzer` is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to