Hi,
when exploring the Hive Explain statement we were wondering about the different 
stages.
So here two questions regarding the below Explain statement 
1. Why are there two root stages? What exactly does root stage mean (i assume 
it meanst there are no predecessors)?
2. What exactly is a Fetch Stage? Is it an actual MapReduce stage?
3. Where can I find additional information about these stages in general?

Thanks a lot for your support 
JS
P.S. This has already been posted to the user mailing list but from there we 
unfortunately received no reply...


hive> EXPLAIN   SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice) 
FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP BY 
l_orderkey, o_shippingpriority;        
OK
ABSTRACT SYNTAX TREE:
 (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem) (= (. 
(TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders) 
o_orderkey)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR (TOK_TABLE_OR_COL 
o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL 
l_extendedprice)))) (TOK_GROUPBY (TOK_TABLE_OR_COL l_orderkey) 
(TOK_TABLE_OR_COL o_shippingpriority))))

STAGE DEPENDENCIES:
 Stage-1 is a root stage
 Stage-2 depends on stages: Stage-1
 Stage-0 is a root stage

STAGE PLANS:
 Stage: Stage-1
 Map Reduce
 Alias -> Map Operator Tree:
 lineitem 
 TableScan
 alias: lineitem
 Reduce Output Operator
 key expressions:
 expr: l_orderkey
 type: int
 sort order: +
 Map-reduce partition columns:
 expr: l_orderkey
 type: int
 tag: 1
 value expressions:
 expr: l_orderkey
 type: int
 expr: l_extendedprice
 type: int
 orders 
 TableScan
 alias: orders
 Reduce Output Operator
 key expressions:
 expr: o_orderkey
 type: int
 sort order: +
 Map-reduce partition columns:
 expr: o_orderkey
 type: int
 tag: 0
 value expressions:
 expr: o_shippingpriority
 type: int
 Reduce Operator Tree:
 Join Operator
 condition map:
 Inner Join 0 to 1
 condition expressions:
 0 {VALUE._col1}
 1 {VALUE._col0} {VALUE._col1}
 handleSkewJoin: false
 outputColumnNames: _col1, _col3, _col4
 Select Operator
 expressions:
 expr: _col3
 type: int
 expr: _col1
 type: int
 expr: _col4
 type: int
 outputColumnNames: _col3, _col1, _col4
 Group By Operator
 aggregations:
 expr: sum(_col4)
 bucketGroup: false
 keys:
 expr: _col3
 type: int
 expr: _col1
 type: int
 mode: hash
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

 Stage: Stage-2
 Map Reduce
 Alias -> Map Operator Tree:
 
hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002
 
 Reduce Output Operator
 key expressions:
 expr: _col0
 type: int
 expr: _col1
 type: int
 sort order: ++
 Map-reduce partition columns:
 expr: _col0
 type: int
 expr: _col1
 type: int
 tag: -1
 value expressions:
 expr: _col2
 type: bigint
 Reduce Operator Tree:
 Group By Operator
 aggregations:
 expr: sum(VALUE._col0)
 bucketGroup: false
 keys:
 expr: KEY._col0
 type: int
 expr: KEY._col1
 type: int
 mode: mergepartial
 outputColumnNames: _col0, _col1, _col2
 Select Operator
 expressions:
 expr: _col0
 type: int
 expr: _col1
 type: int
 expr: _col2
 type: bigint
 outputColumnNames: _col0, _col1, _col2
 File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

 Stage: Stage-0
 Fetch Operator
 limit: -1

Reply via email to