Hi, I've been trying to figure out how to know the number of MR jobs that will be ran for a hive query using the EXPLAIN output.
I haven't got to a consistent method to knowing that. for example (in one of my queries, ctas query): STAGE DEPENDENCIES: Stage-1 is a root stage Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5 Stage-4 Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 Stage-8 depends on stages: Stage-0 Stage-2 depends on stages: Stage-8 Stage-3 Stage-5 Stage-6 depends on stages: Stage-5 Stage-1, Stage-3, Stage-5 are listed as map reduce steps. eventually 2 MR jobs ran. in other cases only 1 job runs. I couldn't find a consistent rule on how to figure this out. can anyone help?? Thank you!! below is full output explain CREATE TABLE beekeeper_results.test3 ROW FORMAT SERDE "com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde" WITH SERDEPROPERTIES ('escape.delim'='\\', 'mapkey.delim'='\;', 'colelction.delim'='|') AS SELECT * FROM beekeeper_results.test2; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5 Stage-4 Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 Stage-8 depends on stages: Stage-0 Stage-2 depends on stages: Stage-8 Stage-3 Stage-5 Stage-6 depends on stages: Stage-5 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: test2 Statistics: Num rows: 112 Data size: 11690 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: blasttag (type: string), actioncounts (type: array<struct<actiontype:string,count:int>>), detailedclicks (type: array<struct<linkindex:int,count:int,linkname:string>>), countsbyclient (type: array<struct<client:string,actiontype:string,count:int>>), totalactioncounts (type: array<struct<actiontype:string,count:int>>), actionsbydate (type: array<struct<datesent:string,actiontype:string,count:int>>) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 112 Data size: 11690 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 112 Data size: 11690 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde name: beekeeper_results.test3 Stage: Stage-7 Conditional Operator Stage: Stage-4 Move Operator files: hdfs directory: true destination: hdfs://hadoop-alidoro-nn-vip/user/hive/warehouse/.hive-staging_hive_2015-12-11_21-52-35_063_8498858370292854265-1/-ext-10001 Stage: Stage-0 Move Operator files: hdfs directory: true destination: *** Stage: Stage-8 Create Table Operator: Create Table columns: blasttag string, actioncounts array<struct<actiontype:string,count:int>>, detailedclicks array<struct<linkindex:int,count:int,linkname:string>>, countsbyclient array<struct<client:string,actiontype:string,count:int>>, totalactioncounts array<struct<actiontype:string,count:int>>, actionsbydate array<struct<datesent:string,actiontype:string,count:int>> input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat serde name: com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde serde properties: colelction.delim | escape.delim \ mapkey.delim ; name: beekeeper_results.test3 Stage: Stage-2 Stats-Aggr Operator Stage: Stage-3 Map Reduce Map Operator Tree: TableScan File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde name: beekeeper_results.test3 Stage: Stage-5 Map Reduce Map Operator Tree: TableScan File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde name: beekeeper_results.test3 Stage: Stage-6 Move Operator files: hdfs directory: true destination: ***