[ 
https://issues.apache.org/jira/browse/PIG-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-361:
---------------------------

    Attachment: PIG-361.patch

This patch makes a number of changes.  It removes IndexedTuple.  Instead values 
are passed between map and reduce jobs as NullableTuples.  These extend 
WritableComparable and contain a tuple.  They also have bytes to indicate 
whether a tuple is null and which part of a join it comes from.

A new type PigNullableWritable has been added.  All of the NullableXWritable 
types now extend this (including NullableTuple).  Keys passed between map and 
reduce jobs are now of this type.  This allows the sorting to be done on the 
index but not the grouping or partitioning.

I also found a major problem in the SortParitioner.  It was assuming all input 
were tuples and then applying the raw comparator.  But in 2.0 we do not use 
tuples in the case of a single key.  So I modified SortPartitioner to correctly 
determine the key type and use the correct type of comparator.



> JOIN and cogroup should handle NULLs correctly
> ----------------------------------------------
>
>                 Key: PIG-361
>                 URL: https://issues.apache.org/jira/browse/PIG-361
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: PIG-361.patch
>
>
> JOIN should follow SQL semantics .i.e if the join key is a null or part of 
> the join key is null in the first table, it should not join with similar keys 
> in the second table.
> Cogroup should coalesce all NULL key rows into one group.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to