Hi,

I've been trying to figure out how to know the number of MR jobs that will
be ran for a hive query using the EXPLAIN output.

I haven't got to a consistent method to knowing that.

for example (in one of my queries, ctas query):
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-8 depends on stages: Stage-0
  Stage-2 depends on stages: Stage-8
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

Stage-1, Stage-3, Stage-5 are listed as map reduce steps.

eventually 2 MR jobs ran.

in other cases only 1 job runs.

I couldn't find a consistent rule on how to figure this out.

can anyone help??

Thank you!!

below is full output

explain CREATE TABLE beekeeper_results.test3 ROW FORMAT SERDE
"com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde" WITH
SERDEPROPERTIES ('escape.delim'='\\', 'mapkey.delim'='\;',
'colelction.delim'='|') AS SELECT * FROM beekeeper_results.test2;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-8 depends on stages: Stage-0
  Stage-2 depends on stages: Stage-8
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test2
            Statistics: Num rows: 112 Data size: 11690 Basic stats:
COMPLETE Column stats: NONE
            Select Operator
              expressions: blasttag (type: string), actioncounts (type:
array<struct<actiontype:string,count:int>>), detailedclicks (type:
array<struct<linkindex:int,count:int,linkname:string>>), countsbyclient
(type: array<struct<client:string,actiontype:string,count:int>>),
totalactioncounts (type: array<struct<actiontype:string,count:int>>),
actionsbydate (type:
array<struct<datesent:string,actiontype:string,count:int>>)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
              Statistics: Num rows: 112 Data size: 11690 Basic stats:
COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 112 Data size: 11690 Basic stats:
COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde:
com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde
                    name: beekeeper_results.test3

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination:
hdfs://hadoop-alidoro-nn-vip/user/hive/warehouse/.hive-staging_hive_2015-12-11_21-52-35_063_8498858370292854265-1/-ext-10001

  Stage: Stage-0
    Move Operator
      files:
          hdfs directory: true
          destination: ***

  Stage: Stage-8
      Create Table Operator:
        Create Table
          columns: blasttag string, actioncounts
array<struct<actiontype:string,count:int>>, detailedclicks
array<struct<linkindex:int,count:int,linkname:string>>, countsbyclient
array<struct<client:string,actiontype:string,count:int>>, totalactioncounts
array<struct<actiontype:string,count:int>>, actionsbydate
array<struct<datesent:string,actiontype:string,count:int>>
          input format: org.apache.hadoop.mapred.TextInputFormat
          output format:
org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
          serde name:
com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde
          serde properties:
            colelction.delim |
            escape.delim \
            mapkey.delim ;
          name: beekeeper_results.test3

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde:
com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde
                  name: beekeeper_results.test3

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde:
com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde
                  name: beekeeper_results.test3

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: ***

Reply via email to