[
https://issues.apache.org/jira/browse/PIG-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904819#action_12904819
]
Scott Carey commented on PIG-1506:
----------------------------------
The SQL behavior of the above for an outer join would be to have five rows
output -- just like COGROUP would if flattened. So that seems fine to me. A
self-join should be the same as a COGROUP with yourself, which is different
than a simple GROUP.
However, there is a problem with inner join and nulls.
Pig JOIN is not like SQL with respect to nulls on multi-column joins. (I have
not tried on trunk however)
In SQL, if ANY of the columns in a multi-column join is null, the row is not
output.
Try:
{code}
A = load 'small' as (name, age, gpa);
B = load 'small' as (name, age, gpa);
C = join A by (name,age), B by (name,age);
dump C;
{code}
The result for SQL would be one row of the form
joe 5 2.5 joe 5 2.5
> Need to clarify the difference between null handling in JOIN and COGROUP
> ------------------------------------------------------------------------
>
> Key: PIG-1506
> URL: https://issues.apache.org/jira/browse/PIG-1506
> Project: Pig
> Issue Type: Improvement
> Components: documentation
> Reporter: Olga Natkovich
> Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.