Github user jeanlyn commented on the pull request:
https://github.com/apache/spark/pull/6682#issuecomment-109871137
@yhuai .Yes,the full outer join cases shuffled the null key to the same
reducer in spark-sql ,and the hive plan generated like:
```sql
explain select a.value,b.value,c.value,d.value from
a full outer join b
on a.key = b.key
full outer join c
on a.key = c.key
full outer join d
on a.key = d.key
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
value expressions: value (type: string)
TableScan
alias: b
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
value expressions: value (type: string)
TableScan
alias: c
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
value expressions: value (type: string)
TableScan
alias: d
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
value expressions: value (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
Outer Join 0 to 2
Outer Join 0 to 3
keys:
0 key (type: string)
1 key (type: string)
2 key (type: string)
3 key (type: string)
outputColumnNames: _col1, _col6, _col11, _col16
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Select Operator
expressions: _col1 (type: string), _col6 (type: string), _col11
(type: string), _col16 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
```
@chenghao-intel has a solution in #6413
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]