[
https://issues.apache.org/jira/browse/PIG-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-361:
---------------------------
Attachment: PIG-361.patch
This patch makes a number of changes. It removes IndexedTuple. Instead values
are passed between map and reduce jobs as NullableTuples. These extend
WritableComparable and contain a tuple. They also have bytes to indicate
whether a tuple is null and which part of a join it comes from.
A new type PigNullableWritable has been added. All of the NullableXWritable
types now extend this (including NullableTuple). Keys passed between map and
reduce jobs are now of this type. This allows the sorting to be done on the
index but not the grouping or partitioning.
I also found a major problem in the SortParitioner. It was assuming all input
were tuples and then applying the raw comparator. But in 2.0 we do not use
tuples in the case of a single key. So I modified SortPartitioner to correctly
determine the key type and use the correct type of comparator.
> JOIN and cogroup should handle NULLs correctly
> ----------------------------------------------
>
> Key: PIG-361
> URL: https://issues.apache.org/jira/browse/PIG-361
> Project: Pig
> Issue Type: Sub-task
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Assignee: Alan Gates
> Priority: Critical
> Fix For: types_branch
>
> Attachments: PIG-361.patch
>
>
> JOIN should follow SQL semantics .i.e if the join key is a null or part of
> the join key is null in the first table, it should not join with similar keys
> in the second table.
> Cogroup should coalesce all NULL key rows into one group.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.