[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1543: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. > IsEmpty returns the wrong value after using LIMIT > - > > Key: PIG-1543 > URL: https://issues.apache.org/jira/browse/PIG-1543 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Justin Hu >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1543-1.patch > > > 1. Two input files: > 1a: limit_empty.input_a > 1 > 1 > 1 > 1b: limit_empty.input_b > 2 > 2 > 2. > The pig script: limit_empty.pig > -- A contains only 1's & B contains only 2's > A = load 'limit_empty.input_a' as (a1:int); > B = load 'limit_empty.input_a' as (b1:int); > C =COGROUP A by a1, B by b1; > D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), > COUNT(B); > store D into 'limit_empty.output/d'; > -- After the script done, we see the right results: > -- {(1),(1),(1)} {} 1 0 3 0 > -- {} {(2),(2)} 0 1 0 2 > C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; } > D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? > 0:1), COUNT(Alim), COUNT(Blim); > store D1 into 'limit_empty.output/d1'; > -- After the script done, we see the unexpected results: > -- {(1)} {}1 1 1 0 > -- {} {(2)} 1 1 0 1 > dump D; > dump D1; > 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues: > The major one: > IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while > IsEmpty() returns correctly in limit_empty.output/d/*. > The difference is that one has been applied with "LIMIT" before using > IsEmpty(). > The minor one: > The redirected output only contains the first dump: > ({(1),(1),(1)},{},1,0,3L,0L) > ({},{(2),(2)},0,1,0L,2L) > We expect two more lines like: > ({(1)},{},1,1,1L,0L) > ({},{(2)},1,1,0L,1L) > Besides, there is error says: > [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - > java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.pig.data.Tuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1543: Status: Patch Available (was: Open) > IsEmpty returns the wrong value after using LIMIT > - > > Key: PIG-1543 > URL: https://issues.apache.org/jira/browse/PIG-1543 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Justin Hu >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1543-1.patch > > > 1. Two input files: > 1a: limit_empty.input_a > 1 > 1 > 1 > 1b: limit_empty.input_b > 2 > 2 > 2. > The pig script: limit_empty.pig > -- A contains only 1's & B contains only 2's > A = load 'limit_empty.input_a' as (a1:int); > B = load 'limit_empty.input_a' as (b1:int); > C =COGROUP A by a1, B by b1; > D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), > COUNT(B); > store D into 'limit_empty.output/d'; > -- After the script done, we see the right results: > -- {(1),(1),(1)} {} 1 0 3 0 > -- {} {(2),(2)} 0 1 0 2 > C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; } > D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? > 0:1), COUNT(Alim), COUNT(Blim); > store D1 into 'limit_empty.output/d1'; > -- After the script done, we see the unexpected results: > -- {(1)} {}1 1 1 0 > -- {} {(2)} 1 1 0 1 > dump D; > dump D1; > 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues: > The major one: > IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while > IsEmpty() returns correctly in limit_empty.output/d/*. > The difference is that one has been applied with "LIMIT" before using > IsEmpty(). > The minor one: > The redirected output only contains the first dump: > ({(1),(1),(1)},{},1,0,3L,0L) > ({},{(2),(2)},0,1,0L,2L) > We expect two more lines like: > ({(1)},{},1,1,1L,0L) > ({},{(2)},1,1,0L,1L) > Besides, there is error says: > [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - > java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.pig.data.Tuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1543: Attachment: PIG-1543-1.patch This patch fix the first issue. The problem is we erroneously put a null in the bag when we expect an empty bag The second issue is a side effect of first issue. BinInterSedes has the assumption that bag only contains tuple, so it does not expect a null inside bag. This issue is fixed automatically once first issue is in. > IsEmpty returns the wrong value after using LIMIT > - > > Key: PIG-1543 > URL: https://issues.apache.org/jira/browse/PIG-1543 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Justin Hu >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1543-1.patch > > > 1. Two input files: > 1a: limit_empty.input_a > 1 > 1 > 1 > 1b: limit_empty.input_b > 2 > 2 > 2. > The pig script: limit_empty.pig > -- A contains only 1's & B contains only 2's > A = load 'limit_empty.input_a' as (a1:int); > B = load 'limit_empty.input_a' as (b1:int); > C =COGROUP A by a1, B by b1; > D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), > COUNT(B); > store D into 'limit_empty.output/d'; > -- After the script done, we see the right results: > -- {(1),(1),(1)} {} 1 0 3 0 > -- {} {(2),(2)} 0 1 0 2 > C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; } > D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? > 0:1), COUNT(Alim), COUNT(Blim); > store D1 into 'limit_empty.output/d1'; > -- After the script done, we see the unexpected results: > -- {(1)} {}1 1 1 0 > -- {} {(2)} 1 1 0 1 > dump D; > dump D1; > 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues: > The major one: > IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while > IsEmpty() returns correctly in limit_empty.output/d/*. > The difference is that one has been applied with "LIMIT" before using > IsEmpty(). > The minor one: > The redirected output only contains the first dump: > ({(1),(1),(1)},{},1,0,3L,0L) > ({},{(2),(2)},0,1,0L,2L) > We expect two more lines like: > ({(1)},{},1,1,1L,0L) > ({},{(2)},1,1,0L,1L) > Besides, there is error says: > [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - > java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.pig.data.Tuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1543: Fix Version/s: 0.8.0 > IsEmpty returns the wrong value after using LIMIT > - > > Key: PIG-1543 > URL: https://issues.apache.org/jira/browse/PIG-1543 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Justin Hu > Fix For: 0.8.0 > > > 1. Two input files: > 1a: limit_empty.input_a > 1 > 1 > 1 > 1b: limit_empty.input_b > 2 > 2 > 2. > The pig script: limit_empty.pig > -- A contains only 1's & B contains only 2's > A = load 'limit_empty.input_a' as (a1:int); > B = load 'limit_empty.input_a' as (b1:int); > C =COGROUP A by a1, B by b1; > D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), > COUNT(B); > store D into 'limit_empty.output/d'; > -- After the script done, we see the right results: > -- {(1),(1),(1)} {} 1 0 3 0 > -- {} {(2),(2)} 0 1 0 2 > C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; } > D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? > 0:1), COUNT(Alim), COUNT(Blim); > store D1 into 'limit_empty.output/d1'; > -- After the script done, we see the unexpected results: > -- {(1)} {}1 1 1 0 > -- {} {(2)} 1 1 0 1 > dump D; > dump D1; > 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues: > The major one: > IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while > IsEmpty() returns correctly in limit_empty.output/d/*. > The difference is that one has been applied with "LIMIT" before using > IsEmpty(). > The minor one: > The redirected output only contains the first dump: > ({(1),(1),(1)},{},1,0,3L,0L) > ({},{(2),(2)},0,1,0L,2L) > We expect two more lines like: > ({(1)},{},1,1,1L,0L) > ({},{(2)},1,1,0L,1L) > Besides, there is error says: > [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - > java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.pig.data.Tuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.