Alan's book Programming Pig, Ch. 7 has a good section on this. Also try the -dot opt on http://pig.apache.org/docs/r0.9.1/test.html#explain as well, to get a diagram-repr generated.
Which specific part of the output are you having trouble understanding though? On Tue, Jan 31, 2012 at 3:02 PM, praveenesh kumar <[email protected]> wrote: > Can anyone help me understanding "Explain" Operator in pig ? > > I know it gives some logical/physical and Map/Reduce plan for the pig > script we execute ? > But its kind of tricky to understand the output of "Explain" operator ? > > I know what I am trying to do in Pig. But what I want to know is what > things I can get by using Explain operator and how can I use the output of > Explain operator.Can anyone helps me in understanding that ? > > Like if I I have the following pig script: > > Data = Load 'input.csv' using PigStorage(','); > IDs = FOREACH Data GENERATE $0; > UniqueID = Distinct IDs parallel 40; > Explain IDs; > Explain UniqueID; > Dump UniqueID; > > > > > #----------------------------------------------- > # New Logical Plan: > #----------------------------------------------- > IDs: (Name: LOStore Schema: #4:bytearray) > | > |---IDs: (Name: LOForEach Schema: #4:bytearray) > | | > | (Name: LOGenerate[false] Schema: > #4:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[4] > | | | > | | (Name: Project Type: bytearray Uid: 4 Input: 0 Column: (*)) > | | > | |---(Name: LOInnerLoad[0] Schema: #4:bytearray) > | > |---Data: (Name: LOLoad Schema: null)RequiredFields:null > > #----------------------------------------------- > # Physical Plan: > #----------------------------------------------- > IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4 > | > |---IDs: New For Each(false)[bag] - scope-3 > | | > | Project[bytearray][0] - scope-1 > | > |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-0 > > 2012-01-31 03:25:41,756 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - > File concatenation threshold: 100 optimistic? false > 2012-01-31 03:25:41,773 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > 2012-01-31 03:25:41,773 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-5 > Map Plan > IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4 > | > |---IDs: New For Each(false)[bag] - scope-3 > | | > | Project[bytearray][0] - scope-1 > | > |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - > scope-0-------- > Global sort: false > ---------------- > > #----------------------------------------------- > # New Logical Plan: > #----------------------------------------------- > UniqueID: (Name: LOStore Schema: #6:bytearray) > | > |---UniqueID: (Name: LODistinct Schema: #6:bytearray) > | > |---IDs: (Name: LOForEach Schema: #6:bytearray) > | | > | (Name: LOGenerate[false] Schema: > #6:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[6] > | | | > | | (Name: Project Type: bytearray Uid: 6 Input: 0 Column: (*)) > | | > | |---(Name: LOInnerLoad[0] Schema: #6:bytearray) > | > |---Data: (Name: LOLoad Schema: null)RequiredFields:null > > #----------------------------------------------- > # Physical Plan: > #----------------------------------------------- > UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11 > | > |---UniqueID: PODistinct[bag] - scope-10 > | > |---IDs: New For Each(false)[bag] - scope-9 > | | > | Project[bytearray][0] - scope-7 > | > |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-6 > > 2012-01-31 03:25:41,883 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - > File concatenation threshold: 100 optimistic? false > 2012-01-31 03:25:41,898 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > 2012-01-31 03:25:41,898 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-12 > Map Plan > Local Rearrange[tuple]{tuple}(true) - scope-14 > | | > | Project[tuple][*] - scope-13 > | > |---IDs: New For Each(false)[bag] - scope-9 > | | > | Project[bytearray][0] - scope-7 > | > |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - > scope-6-------- > Reduce Plan > UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11 > | > |---New For Each(true)[bag] - scope-17 > | | > | Project[tuple][0] - scope-16 > | > |---Package[tuple]{tuple} - scope-15-------- > Global sort: false > ---------------- > > > Thanks, > Praveenesh -- Harsh J Customer Ops. Engineer, Cloudera
