incorrect plan when algebraic functions are nested
--------------------------------------------------

                 Key: PIG-834
                 URL: https://issues.apache.org/jira/browse/PIG-834
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Thejas M Nair
            Priority: Critical


a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.


# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
    |       |
    |       |---Project[bag][2] - 1-123
    |           |
    |           |---Project[bag][1] - 1-124
    |   |
    |   Project[bytearray][0] - 1-133
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
        |
        
|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111--------
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
    |   |
    |   |---Project[bag][0] - 1-135
    |   |
    |   Project[bytearray][1] - 1-134
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-137--------
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
    |   |
    |   |---Project[bag][0] - 1-136
    |
    |---POCombinerPackage[tuple]{bytearray} - 1-145--------
Global sort: false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to