[
https://issues.apache.org/jira/browse/PIG-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628477#action_12628477
]
Olga Natkovich commented on PIG-361:
------------------------------------
After having further discussion, here is what I think is the right thing to do:
(1) Cogroup distinguishes between NULL keys from different relations by
creating separate records
A = load ...
B = load ...
C = congroup A by $0, B by $0;
...
Assuming that both A and B contain null values in the key column, C would look
as follows:
{
....
NULL, {.....}, {}
NULL, {}, {...}
....
}
The first record corresponds to all records of A with NULL key and the second
with record of B with empty key.
(2) This is consistent with SQL semantics that NULLs are not the same. It will
make JOIN work as is and also outer join expressed as COGROUP + FOREACH with
Bincond work as with earlier versions.
(3) The required work is to add relation id to the comparison function. Join
optimization already does that. We will try to piggyback this issue onto join
optimization
> JOIN and cogroup should handle NULLs correctly
> ----------------------------------------------
>
> Key: PIG-361
> URL: https://issues.apache.org/jira/browse/PIG-361
> Project: Pig
> Issue Type: Sub-task
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Assignee: Shravan Matthur Narayanamurthy
> Fix For: types_branch
>
>
> JOIN should follow SQL semantics .i.e if the join key is a null or part of
> the join key is null in the first table, it should not join with similar keys
> in the second table.
> Cogroup should coalesce all NULL key rows into one group.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.