Hi,
when exploring the Hive Explain statement we were wondering about the different
stages.
So here two questions regarding the below Explain statement
1. Why are there two root stages? What exactly does root stage mean (i assume
it meanst there are no predecessors)?
2. What exactly is a Fetch Stage? Is it an actual MapReduce stage?
3. Where can I find additional information about these stages in general?
Thanks a lot for your support
JS
P.S. This has already been posted to the user mailing list but from there we
unfortunately received no reply...
hive EXPLAIN SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice)
FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP BY
l_orderkey, o_shippingpriority;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem) (= (.
(TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders)
o_orderkey (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
(TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR (TOK_TABLE_OR_COL
o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL
l_extendedprice (TOK_GROUPBY (TOK_TABLE_OR_COL l_orderkey)
(TOK_TABLE_OR_COL o_shippingpriority
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias - Map Operator Tree:
lineitem
TableScan
alias: lineitem
Reduce Output Operator
key expressions:
expr: l_orderkey
type: int
sort order: +
Map-reduce partition columns:
expr: l_orderkey
type: int
tag: 1
value expressions:
expr: l_orderkey
type: int
expr: l_extendedprice
type: int
orders
TableScan
alias: orders
Reduce Output Operator
key expressions:
expr: o_orderkey
type: int
sort order: +
Map-reduce partition columns:
expr: o_orderkey
type: int
tag: 0
value expressions:
expr: o_shippingpriority
type: int
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col1}
1 {VALUE._col0} {VALUE._col1}
handleSkewJoin: false
outputColumnNames: _col1, _col3, _col4
Select Operator
expressions:
expr: _col3
type: int
expr: _col1
type: int
expr: _col4
type: int
outputColumnNames: _col3, _col1, _col4
Group By Operator
aggregations:
expr: sum(_col4)
bucketGroup: false
keys:
expr: _col3
type: int
expr: _col1
type: int
mode: hash
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Stage: Stage-2
Map Reduce
Alias - Map Operator Tree:
hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002
Reduce Output Operator
key expressions:
expr: _col0
type: int
expr: _col1
type: int
sort order: ++
Map-reduce partition columns:
expr: _col0
type: int
expr: _col1
type: int
tag: -1
value expressions:
expr: _col2
type: bigint
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE._col0)
bucketGroup: false
keys:
expr: KEY._col0
type: int
expr: KEY._col1
type: int
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Select Operator
expressions:
expr: _col0
type: int
expr: _col1
type: int
expr: _col2
type: bigint
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1