Hi, when exploring the Hive Explain statement we were wondering about the different stages. So here two questions regarding the below Explain statement 1. Why are there two root stages? What exactly does root stage mean (i assume it meanst there are no predecessors)? 2. What exactly is a Fetch Stage? Is it an actual MapReduce stage? 3. Where can I find additional information about these stages in general?
Thanks a lot for your support JS P.S. This has already been posted to the user mailing list but from there we unfortunately received no reply... hive> EXPLAIN SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice) FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP BY l_orderkey, o_shippingpriority; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem) (= (. (TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders) o_orderkey)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR (TOK_TABLE_OR_COL o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL l_extendedprice)))) (TOK_GROUPBY (TOK_TABLE_OR_COL l_orderkey) (TOK_TABLE_OR_COL o_shippingpriority)))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: lineitem TableScan alias: lineitem Reduce Output Operator key expressions: expr: l_orderkey type: int sort order: + Map-reduce partition columns: expr: l_orderkey type: int tag: 1 value expressions: expr: l_orderkey type: int expr: l_extendedprice type: int orders TableScan alias: orders Reduce Output Operator key expressions: expr: o_orderkey type: int sort order: + Map-reduce partition columns: expr: o_orderkey type: int tag: 0 value expressions: expr: o_shippingpriority type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col3, _col4 Select Operator expressions: expr: _col3 type: int expr: _col1 type: int expr: _col4 type: int outputColumnNames: _col3, _col1, _col4 Group By Operator aggregations: expr: sum(_col4) bucketGroup: false keys: expr: _col3 type: int expr: _col1 type: int mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002 Reduce Output Operator key expressions: expr: _col0 type: int expr: _col1 type: int sort order: ++ Map-reduce partition columns: expr: _col0 type: int expr: _col1 type: int tag: -1 value expressions: expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: int expr: KEY._col1 type: int mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: int expr: _col1 type: int expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1