[jira] [Updated] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-29 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5370:
--
Attachment: pig-5370-v2.patch

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch, pig-5370-v2.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-27 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5370:
--
Attachment: pig-5370-v1.patch

I can think of two different approaches.

(i) Even for overlapping uids on different nested level, do not allow them and
 force IdentityColumn. This way, all uids will be unique.

(ii) Change LOUnion uidMapping logic from (output_uid, input_uid) lists to
 (output_uid, nested_uids).

Attaching a patch that tries (ii). If possible, I'd like to avoid (i) which is 
already
 creating more uids to keep track.

Taking one relation as example,
{noformat}
B: (Name: LOForEach Schema: 
A#36:bag{#37:tuple(a1#9:int,a2#*10*:chararray,a3#*11*:int)},a2#*10*:chararray,a3#*11*:int)
{noformat}
Before the patch, input_uid
 36,9,10,11,10,11
were used for uidMapping.

After the patch, it'll use nested_uids,
 _36, _36_9, _36_10, _36_11, _10, _11

This way, there won't be any incorrect list lookup.

 [~daijy], would this approach work? 


> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-21 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5370:
--
Issue Type: Bug  (was: Task)

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)