[
https://issues.apache.org/jira/browse/PIG-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846246#action_12846246
]
Daniel Dai commented on PIG-1289:
---------------------------------
Oh, you are right. I need to further change the join type to regular join to
make my idea work. Since changing the join type may broke some other part of
the code, so let's be safe here and only push filter in front of a branch not
generating null.
> PIG Join fails while doing a filter on joined data
> --------------------------------------------------
>
> Key: PIG-1289
> URL: https://issues.apache.org/jira/browse/PIG-1289
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Karim Saadah
> Assignee: Daniel Dai
> Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1289-1.patch
>
>
> PIG Join fails while doing a filter on joined data
> Here are the steps to reproduce it:
> -bash-3.1$ pig -latest -x local
> grunt> a = load 'first.dat' using PigStorage('\u0001') as (f1:int,
> f2:chararray);
> grunt> DUMP a;
> (1,A)
> (2,B)
> (3,C)
> (4,D)
> grunt> b = load 'second.dat' using PigStorage() as (f3:chararray);
> grunt> DUMP b;
> (A)
> (D)
> (E)
> grunt> c = join a by f2 LEFT OUTER, b by f3;
> grunt> DUMP c;
> (1,A,A)
> (2,B,)
> (3,C,)
> (4,D,D)
> grunt> describe c;
> c: {a::f1: int,a::f2: chararray,b::f3: chararray}
> grunt> d = filter c by (f3 is null or f3 =='');
> grunt> dump d;
> 2010-03-03 15:00:37,129 [main] INFO
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
> for b
> 2010-03-03 15:00:37,129 [main] INFO
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned
> for b
> 2010-03-03 15:00:37,129 [main] INFO
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
> for a
> 2010-03-03 15:00:37,130 [main] INFO
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned
> for a
> 2010-03-03 15:00:37,130 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1002: Unable to store alias d
> This one is failing too:
> grunt> d = filter c by (b::f3 is null or b::f3 =='');
> or this one not returning results as expected:
> grunt> d = foreach c generate f1 as f1, f2 as f2, f3 as f3;
> grunt> e = filter d by (f3 is null or f3 =='');
> grunt> DUMP e;
> (1,A,)
> (2,B,)
> (3,C,)
> (4,D,)
> while the expected result is
> (2,B,)
> (3,C,)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.