[jira] Updated: (PIG-97) Jobs produce wrong results when a cogroup is in the script and the compiler chooses to use the combiner feature of hadoop.

Alan Gates (JIRA) Thu, 07 Feb 2008 09:51:29 -0800

     [ 
https://issues.apache.org/jira/browse/PIG-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alan Gates updated PIG-97:
--------------------------

    Patch Info: [Patch Available]

> Jobs produce wrong results when a cogroup is in the script and the compiler 
> chooses to use the combiner feature of hadoop.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-97
>                 URL: https://issues.apache.org/jira/browse/PIG-97
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: cogroupcombiner.patch
>
>
> The following script will produce 0 output records, even when it should 
> produce records:
> a = load 'file1';
> b = load 'file2';
> c = cogroup a by $0, b by $0;
> d = foreach c generate $0, COUNT($1), COUNT($2);
> dump d;
> In this case pig chooses to use the combiner in order to be more efficient.  
> However, the following code in PigCombiner.java causes a problem:
> for (int i = 0; i < inputCount; i++) {  // XXX: shouldn't we only do this if 
> INNER flag is set?
>     if (t.getBagField(1 + i).size() == 0) return;
> }
> In this case a map is often running on a machine where it has access to only 
> one of the two files and thus there is nothing in one of the bags, so the 
> above lines of code cause the combiner to bailout without pushing any tuples 
> to the OutputCollector.
> The proposed solution for the short term is to disable use of the combiner in 
> cases where more than one file are grouped together.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-97) Jobs produce wrong results when a cogroup is in the script and the compiler chooses to use the combiner feature of hadoop.

Reply via email to