[GitHub] spark pull request: [SPARK-2205][SPARK-7871][SQL]Advoid redundancy...

jeanlyn Sun, 07 Jun 2015 22:43:29 -0700

Github user jeanlyn commented on the pull request:

    https://github.com/apache/spark/pull/6682#issuecomment-109871137
  
    @yhuai .Yes,the full outer join cases shuffled the null key to the same 
reducer in spark-sql ,and the hive plan generated like:
    ```sql
    explain select a.value,b.value,c.value,d.value from
    a full outer join b 
    on a.key = b.key
    full outer join c
    on a.key = c.key
    full outer join d
    on a.key = d.key
    
    
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
              TableScan
                alias: a
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                Reduce Output Operator
                  key expressions: key (type: string)
                  sort order: +
                  Map-reduce partition columns: key (type: string)
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                  value expressions: value (type: string)
              TableScan
                alias: b
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                Reduce Output Operator
                  key expressions: key (type: string)
                  sort order: +
                  Map-reduce partition columns: key (type: string)
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                  value expressions: value (type: string)
              TableScan
                alias: c
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                Reduce Output Operator
                  key expressions: key (type: string)
                  sort order: +
                  Map-reduce partition columns: key (type: string)
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                  value expressions: value (type: string)
              TableScan
                alias: d
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                Reduce Output Operator
                  key expressions: key (type: string)
                  sort order: +
                  Map-reduce partition columns: key (type: string)
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                  value expressions: value (type: string)
          Reduce Operator Tree:
            Join Operator
              condition map:
                   Outer Join 0 to 1
                   Outer Join 0 to 2
                   Outer Join 0 to 3
              keys:
                0 key (type: string)
                1 key (type: string)
                2 key (type: string)
                3 key (type: string)
              outputColumnNames: _col1, _col6, _col11, _col16
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
              Select Operator
                expressions: _col1 (type: string), _col6 (type: string), _col11 
(type: string), _col16 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    
      Stage: Stage-0
        Fetch Operator
          limit: -1
          Processor Tree:
            ListSink
    ```
    @chenghao-intel has a solution in #6413



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2205][SPARK-7871][SQL]Advoid redundancy...

Reply via email to