[ https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Santhosh Srinivasan updated PIG-578: ------------------------------------ Issue Type: Improvement (was: Bug) Marking this as an improvement as Pig does not support outer joins as a language construct. The keyword outer is ignored in the join statement currently. This should be fixed to allow outer joins (left, right and full). > join ... outer, ... outer semantics are a no-ops, should produce > corresponding null values > ------------------------------------------------------------------------------------------ > > Key: PIG-578 > URL: https://issues.apache.org/jira/browse/PIG-578 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: types_branch > Reporter: David Ciemiewicz > > Currently using the "OUTER" modifier in the JOIN statement is a no-op. The > resuls of JOIN are always an INNER join. Now that the Pig types branch > supports null values proper, the semantics of JOIN ... OUTER, ... OUTER > should be corrected to do proper outer joins and populating the corresponding > empty values with nulls. > Here's the example: > A = load 'a.txt' using PigStorage() as ( comment, value ); > B = load 'b.txt' using PigStorage() as ( comment, value ); > -- > -- OUTER clause is ignored in JOIN statement and does not populat tuple with > -- null values as it should. Otherwise OUTER is a meaningless no-op modifier. > -- > ABOuterJoin = join A by ( comment ) outer, B by ( comment ) outer; > describe ABOuterJoin; > dump ABOuterJoin; > The file a contains: > a-only 1 > ab-both 2 > The file b contains: > ab-both 2 > b-only 3 > When you execute the script today, the dump results are: > (ab-both,2,ab-both,2) > The expected dump results should be: > (a-only,1,,) > (ab-both,2,ab-both,2) > (,,b-only,3) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.