[GitHub] spark pull request #22124: [SPARK-25135][SQL] Insert datasource table may al...

wangyum Mon, 20 Aug 2018 03:09:29 -0700

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22124#discussion_r211208353
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -384,7 +384,12 @@ object RemoveRedundantAliases extends 
Rule[LogicalPlan] {
         }
       }
     
    -  def apply(plan: LogicalPlan): LogicalPlan = removeRedundantAliases(plan, 
AttributeSet.empty)
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    plan match {
    +      case c: Command => c
    +      case _ => removeRedundantAliases(plan, AttributeSet.empty)
    --- End diff --
    
    Yes, this is correct. Without this PR, `RemoveRedundantAliases` works like 
this:
    ```scala
    === Applying Rule 
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases ===
     InsertIntoHadoopFsRelationCommand 
file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehouse-ae504f50-9543-49fb-a
  InsertIntoHadoopFsRelationCommand 
file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehouse-ae504f50-9543-49fb-acf
     Database: default                                                          
                                                     Database: default
     Table: table2                                                              
                                                     Table: table2
     Owner: yumwang                                                             
                                                     Owner: yumwang
     Created Time: Mon Aug 20 03:03:52 PDT 2018                                 
                                                     Created Time: Mon Aug 20 
03:03:52 PDT 2018
     Last Access: Wed Dec 31 16:00:00 PST 1969                                  
                                                     Last Access: Wed Dec 31 
16:00:00 PST 1969
     Created By: Spark 2.4.0-SNAPSHOT                                           
                                                     Created By: Spark 
2.4.0-SNAPSHOT
     Type: MANAGED                                                              
                                                     Type: MANAGED
     Provider: hive                                                             
                                                     Provider: hive
     Table Properties: [transient_lastDdlTime=1534759432]                       
                                                     Table Properties: 
[transient_lastDdlTime=1534759432]
     Location: 
file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehouse-ae504f50-9543-49fb-acf0-8b2736665d26/table2
   Location: 
file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehouse-ae504f50-9543-49fb-acf0-8b2736665d26/table2
     Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 
                                                     Serde Library: 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
     InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat 
                                                     InputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
     OutputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat                  
                                  OutputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
     Storage Properties: [serialization.format=1]                               
                                                     Storage Properties: 
[serialization.format=1]
     Partition Provider: Catalog                                                
                                                     Partition Provider: Catalog
     Schema: root                                                               
                                                     Schema: root
    -- COL1: long (nullable = true)                                             
                                                   |-- COL1: long (nullable = 
true)
    -- COL2: long (nullable = true)                                             
                                                   |-- COL2: long (nullable = 
true)
    !), org.apache.spark.sql.execution.datasources.InMemoryFileIndex@60582d55, 
[COL1#10L, COL2#11L]                                  ), 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex@60582d55, 
[col1#8L, col2#9L]
    !+- Project [col1#8L AS col1#10L, col2#9L AS col2#11L]                      
                                                     +- Project [col1#8L, 
col2#9L]
        +- Filter (col1#8L > -20)                                               
                                                        +- Filter (col1#8L > 
-20)
           +- Relation[col1#8L,col2#9L] parquet                                 
                                                           +- 
Relation[col1#8L,col2#9L] parquet
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22124: [SPARK-25135][SQL] Insert datasource table may al...

Reply via email to