Can anyone help me understanding "Explain" Operator in pig ?
I know it gives some logical/physical and Map/Reduce plan for the pig
script we execute ?
But its kind of tricky to understand the output of "Explain" operator ?
I know what I am trying to do in Pig. But what I want to know is what
things I can get by using Explain operator and how can I use the output of
Explain operator.Can anyone helps me in understanding that ?
Like if I I have the following pig script:
Data = Load 'input.csv' using PigStorage(',');
IDs = FOREACH Data GENERATE $0;
UniqueID = Distinct IDs parallel 40;
Explain IDs;
Explain UniqueID;
Dump UniqueID;
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
IDs: (Name: LOStore Schema: #4:bytearray)
|
|---IDs: (Name: LOForEach Schema: #4:bytearray)
| |
| (Name: LOGenerate[false] Schema:
#4:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[4]
| | |
| | (Name: Project Type: bytearray Uid: 4 Input: 0 Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: #4:bytearray)
|
|---Data: (Name: LOLoad Schema: null)RequiredFields:null
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
|
|---IDs: New For Each(false)[bag] - scope-3
| |
| Project[bytearray][0] - scope-1
|
|---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-0
2012-01-31 03:25:41,756 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-01-31 03:25:41,773 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-01-31 03:25:41,773 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-5
Map Plan
IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
|
|---IDs: New For Each(false)[bag] - scope-3
| |
| Project[bytearray][0] - scope-1
|
|---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
scope-0--------
Global sort: false
----------------
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
UniqueID: (Name: LOStore Schema: #6:bytearray)
|
|---UniqueID: (Name: LODistinct Schema: #6:bytearray)
|
|---IDs: (Name: LOForEach Schema: #6:bytearray)
| |
| (Name: LOGenerate[false] Schema:
#6:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[6]
| | |
| | (Name: Project Type: bytearray Uid: 6 Input: 0 Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: #6:bytearray)
|
|---Data: (Name: LOLoad Schema: null)RequiredFields:null
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
|
|---UniqueID: PODistinct[bag] - scope-10
|
|---IDs: New For Each(false)[bag] - scope-9
| |
| Project[bytearray][0] - scope-7
|
|---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-6
2012-01-31 03:25:41,883 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-01-31 03:25:41,898 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-01-31 03:25:41,898 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-12
Map Plan
Local Rearrange[tuple]{tuple}(true) - scope-14
| |
| Project[tuple][*] - scope-13
|
|---IDs: New For Each(false)[bag] - scope-9
| |
| Project[bytearray][0] - scope-7
|
|---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
scope-6--------
Reduce Plan
UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
|
|---New For Each(true)[bag] - scope-17
| |
| Project[tuple][0] - scope-16
|
|---Package[tuple]{tuple} - scope-15--------
Global sort: false
----------------
Thanks,
Praveenesh