[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-09-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1543:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

> IsEmpty returns the wrong value after using LIMIT
> -
>
> Key: PIG-1543
> URL: https://issues.apache.org/jira/browse/PIG-1543
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Justin Hu
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1543-1.patch
>
>
> 1. Two input files:
> 1a: limit_empty.input_a
> 1
> 1
> 1
> 1b: limit_empty.input_b
> 2
> 2
> 2.
> The pig script: limit_empty.pig
> -- A contains only 1's & B contains only 2's
> A = load 'limit_empty.input_a' as (a1:int);
> B = load 'limit_empty.input_a' as (b1:int);
> C =COGROUP A by a1, B by b1;
> D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
> COUNT(B);
> store D into 'limit_empty.output/d';
> -- After the script done, we see the right results:
> -- {(1),(1),(1)}   {}  1   0   3   0
> -- {} {(2),(2)}  0   1   0   2
> C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
> D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
> 0:1), COUNT(Alim), COUNT(Blim);
> store D1 into 'limit_empty.output/d1';
> -- After the script done, we see the unexpected results:
> -- {(1)}   {}1   1   1   0
> -- {}  {(2)} 1   1   0   1
> dump D;
> dump D1;
> 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
> The major one:
> IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
> IsEmpty() returns correctly in limit_empty.output/d/*.
> The difference is that one has been applied with "LIMIT" before using 
> IsEmpty().
> The minor one:
> The redirected output only contains the first dump:
> ({(1),(1),(1)},{},1,0,3L,0L)
> ({},{(2),(2)},0,1,0L,2L)
> We expect two more lines like:
> ({(1)},{},1,1,1L,0L)
> ({},{(2)},1,1,0L,1L)
> Besides, there is error says:
> [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-09-01 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1543:


Status: Patch Available  (was: Open)

> IsEmpty returns the wrong value after using LIMIT
> -
>
> Key: PIG-1543
> URL: https://issues.apache.org/jira/browse/PIG-1543
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Justin Hu
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1543-1.patch
>
>
> 1. Two input files:
> 1a: limit_empty.input_a
> 1
> 1
> 1
> 1b: limit_empty.input_b
> 2
> 2
> 2.
> The pig script: limit_empty.pig
> -- A contains only 1's & B contains only 2's
> A = load 'limit_empty.input_a' as (a1:int);
> B = load 'limit_empty.input_a' as (b1:int);
> C =COGROUP A by a1, B by b1;
> D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
> COUNT(B);
> store D into 'limit_empty.output/d';
> -- After the script done, we see the right results:
> -- {(1),(1),(1)}   {}  1   0   3   0
> -- {} {(2),(2)}  0   1   0   2
> C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
> D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
> 0:1), COUNT(Alim), COUNT(Blim);
> store D1 into 'limit_empty.output/d1';
> -- After the script done, we see the unexpected results:
> -- {(1)}   {}1   1   1   0
> -- {}  {(2)} 1   1   0   1
> dump D;
> dump D1;
> 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
> The major one:
> IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
> IsEmpty() returns correctly in limit_empty.output/d/*.
> The difference is that one has been applied with "LIMIT" before using 
> IsEmpty().
> The minor one:
> The redirected output only contains the first dump:
> ({(1),(1),(1)},{},1,0,3L,0L)
> ({},{(2),(2)},0,1,0L,2L)
> We expect two more lines like:
> ({(1)},{},1,1,1L,0L)
> ({},{(2)},1,1,0L,1L)
> Besides, there is error says:
> [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-09-01 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1543:


Attachment: PIG-1543-1.patch

This patch fix the first issue. The problem is we erroneously put a null in the 
bag when we expect an empty bag

The second issue is a side effect of first issue. BinInterSedes has the 
assumption that bag only contains tuple, so it does not expect a null inside 
bag. This issue is fixed automatically once first issue is in.

> IsEmpty returns the wrong value after using LIMIT
> -
>
> Key: PIG-1543
> URL: https://issues.apache.org/jira/browse/PIG-1543
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Justin Hu
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1543-1.patch
>
>
> 1. Two input files:
> 1a: limit_empty.input_a
> 1
> 1
> 1
> 1b: limit_empty.input_b
> 2
> 2
> 2.
> The pig script: limit_empty.pig
> -- A contains only 1's & B contains only 2's
> A = load 'limit_empty.input_a' as (a1:int);
> B = load 'limit_empty.input_a' as (b1:int);
> C =COGROUP A by a1, B by b1;
> D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
> COUNT(B);
> store D into 'limit_empty.output/d';
> -- After the script done, we see the right results:
> -- {(1),(1),(1)}   {}  1   0   3   0
> -- {} {(2),(2)}  0   1   0   2
> C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
> D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
> 0:1), COUNT(Alim), COUNT(Blim);
> store D1 into 'limit_empty.output/d1';
> -- After the script done, we see the unexpected results:
> -- {(1)}   {}1   1   1   0
> -- {}  {(2)} 1   1   0   1
> dump D;
> dump D1;
> 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
> The major one:
> IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
> IsEmpty() returns correctly in limit_empty.output/d/*.
> The difference is that one has been applied with "LIMIT" before using 
> IsEmpty().
> The minor one:
> The redirected output only contains the first dump:
> ({(1),(1),(1)},{},1,0,3L,0L)
> ({},{(2),(2)},0,1,0L,2L)
> We expect two more lines like:
> ({(1)},{},1,1,1L,0L)
> ({},{(2)},1,1,0L,1L)
> Besides, there is error says:
> [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-08-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1543:


Fix Version/s: 0.8.0

> IsEmpty returns the wrong value after using LIMIT
> -
>
> Key: PIG-1543
> URL: https://issues.apache.org/jira/browse/PIG-1543
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Justin Hu
> Fix For: 0.8.0
>
>
> 1. Two input files:
> 1a: limit_empty.input_a
> 1
> 1
> 1
> 1b: limit_empty.input_b
> 2
> 2
> 2.
> The pig script: limit_empty.pig
> -- A contains only 1's & B contains only 2's
> A = load 'limit_empty.input_a' as (a1:int);
> B = load 'limit_empty.input_a' as (b1:int);
> C =COGROUP A by a1, B by b1;
> D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
> COUNT(B);
> store D into 'limit_empty.output/d';
> -- After the script done, we see the right results:
> -- {(1),(1),(1)}   {}  1   0   3   0
> -- {} {(2),(2)}  0   1   0   2
> C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
> D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
> 0:1), COUNT(Alim), COUNT(Blim);
> store D1 into 'limit_empty.output/d1';
> -- After the script done, we see the unexpected results:
> -- {(1)}   {}1   1   1   0
> -- {}  {(2)} 1   1   0   1
> dump D;
> dump D1;
> 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
> The major one:
> IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
> IsEmpty() returns correctly in limit_empty.output/d/*.
> The difference is that one has been applied with "LIMIT" before using 
> IsEmpty().
> The minor one:
> The redirected output only contains the first dump:
> ({(1),(1),(1)},{},1,0,3L,0L)
> ({},{(2),(2)},0,1,0L,2L)
> We expect two more lines like:
> ({(1)},{},1,1,1L,0L)
> ({},{(2)},1,1,0L,1L)
> Besides, there is error says:
> [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.