[ 
https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-578:
-------------------------------

    Status: Patch Available  (was: Open)

> join ... outer, ... outer semantics are a no-ops, should produce 
> corresponding null values
> ------------------------------------------------------------------------------------------
>
>                 Key: PIG-578
>                 URL: https://issues.apache.org/jira/browse/PIG-578
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: David Ciemiewicz
>            Assignee: Pradeep Kamath
>             Fix For: 0.4.0
>
>         Attachments: PIG-578-2.patch, PIG-578.patch
>
>
> Currently using the "OUTER" modifier in the JOIN statement is a no-op.  The 
> resuls of JOIN are always an INNER join.  Now that the Pig types branch 
> supports null values proper, the semantics of JOIN ... OUTER, ... OUTER 
> should be corrected to do proper outer joins and populating the corresponding 
> empty values with nulls.
> Here's the example:
> A = load 'a.txt' using PigStorage() as ( comment, value );
> B = load 'b.txt' using PigStorage() as ( comment, value );
> --
> -- OUTER clause is ignored in JOIN statement and does not populat tuple with
> -- null values as it should. Otherwise OUTER is a meaningless no-op modifier.
> --
> ABOuterJoin = join A by ( comment ) outer, B by ( comment ) outer;
> describe ABOuterJoin;
> dump ABOuterJoin;
> The file a contains:
> a-only  1
> ab-both 2
> The file b contains:
> ab-both 2
> b-only  3
> When you execute the script today, the dump results are:
> (ab-both,2,ab-both,2)
> The expected dump results should be:
> (a-only,1,,)
> (ab-both,2,ab-both,2)
> (,,b-only,3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to