[ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1543:
----------------------------

    Attachment: PIG-1543-1.patch

This patch fix the first issue. The problem is we erroneously put a null in the 
bag when we expect an empty bag

The second issue is a side effect of first issue. BinInterSedes has the 
assumption that bag only contains tuple, so it does not expect a null inside 
bag. This issue is fixed automatically once first issue is in.

> IsEmpty returns the wrong value after using LIMIT
> -------------------------------------------------
>
>                 Key: PIG-1543
>                 URL: https://issues.apache.org/jira/browse/PIG-1543
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Hu
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1543-1.patch
>
>
> 1. Two input files:
> 1a: limit_empty.input_a
> 1
> 1
> 1
> 1b: limit_empty.input_b
> 2
> 2
> 2.
> The pig script: limit_empty.pig
> -- A contains only 1's & B contains only 2's
> A = load 'limit_empty.input_a' as (a1:int);
> B = load 'limit_empty.input_a' as (b1:int);
> C =COGROUP A by a1, B by b1;
> D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
> COUNT(B);
> store D into 'limit_empty.output/d';
> -- After the script done, we see the right results:
> -- {(1),(1),(1)}   {}      1       0       3       0
> -- {}         {(2),(2)}      0       1       0       2
> C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
> D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
> 0:1), COUNT(Alim), COUNT(Blim);
> store D1 into 'limit_empty.output/d1';
> -- After the script done, we see the unexpected results:
> -- {(1)}   {}        1       1       1       0
> -- {}      {(2)}     1       1       0       1
> dump D;
> dump D1;
> 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
> The major one:
> IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
> IsEmpty() returns correctly in limit_empty.output/d/*.
> The difference is that one has been applied with "LIMIT" before using 
> IsEmpty().
> The minor one:
> The redirected output only contains the first dump:
> ({(1),(1),(1)},{},1,0,3L,0L)
> ({},{(2),(2)},0,1,0L,2L)
> We expect two more lines like:
> ({(1)},{},1,1,1L,0L)
> ({},{(2)},1,1,0L,1L)
> Besides, there is error says:
> [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to