Thnx!

Sent from my iPhone

On Jun 16, 2010, at 2:54 PM, "Aniket Mokashi" <[email protected]> wrote:

Hi,

This a representation of Pig's physical plan of execution. You can read
more about it at-
http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#EXPLAIN
http://wiki.apache.org/pig/PigExecutionModel

7** are ids for uniquely identifying operators (Logical/Physical/MR) in
pig. [NodeIdGenerator.getNextId()].

As multiple lines in Pig can generate single MapReduce task, it will be hard to associate this part of the plan with the pig script line number.
But "Explain" can help you more.

Lot of functionality in Pig is implemented with the use of userfunc (UDFs).
Snippet from the code explaining where and why we use IsEmpty UDF-
<snip>
public static void addEmptyBagOuterJoin(PhysicalPlan fePlan, Schema
inputSchema) throws PlanException {
// we currently have POProject[bag] as the only operator in the plan
       // If the bag is an empty bag, we should replace
// it with a bag with one tuple with null fields so that when we
flatten
// we do not drop records (flatten will drop records if the bag is
left
// as an empty bag) and actually project nulls for the fields in
       // the empty bag

       // So we need to get to the following state:
       // POProject[Bag]
       //         \
       //    POUserFunc["IsEmpty()"] Const[Bag](bag with null fields)
       //                        \      |    POProject[Bag]
       //                         \     |    /
       //                          POBinCond
</snip>
This explains the use of IsEmpty() UDF.

Hope it helps.

Thanks,
Aniket

On Wed, June 16, 2010 2:52 pm, Corbin Hoenes wrote:
Is there any documentation on how to read this output when I 'set debug
on' I get in my reducer syslog:

DEBUG:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce


$Reduce - New For Each(true,true)[tuple] - 1-770
|   |
|   POBinCond[bag] - 1-768
|   |
|   |---Project[bag][1] - 1-764
|   |
|   |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - 1-766
|   |   |
|   |   |---Project[bag][1] - 1-765
|   |
|   |---Constant({()}) - 1-767
|   |
|   Project[bag][2] - 1-769
DEBUG: org.apache.pig.data.InternalCachedBag - Memory can hold 45450
records, put the rest in spill file. DEBUG:
org.apache.pig.data.InternalCachedBag - Memory can hold 45192 records, put the rest in spill file. DEBUG: org.apache.pig.data.InternalCachedBag -
Memory can hold 44852 records, put the rest in spill file


Specifically what do the 1-7** numbers mean? Is it possible to get line
numbers from the pig script :) Also strange is that it seems that
POUserFunc is telling me we are running the IsEmpty UDF but that UDF
isn't being called in this script at all...is it possible pig is using it
under the covers?





Reply via email to