[jira] Created: (PIG-665) Map key type not correctly set (for use when key is null) when map plan does not have localrearrange

Pradeep Kamath (JIRA) Wed, 11 Feb 2009 12:07:21 -0800

Map key type not correctly set (for use when key is null) when map plan does 
not have localrearrange
----------------------------------------------------------------------------------------------------


                 Key: PIG-665
                 URL: https://issues.apache.org/jira/browse/PIG-665
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch


KeyTypeDiscoveryVisitor visits the map plan to figure out the datatype of the 
map key. This is required so that when the map key is null, we can still 
construct a valid NullableXXXWritable object to pass on to hadoop in the 
collect() call (hadoop needs a valid object even for null objects). Currently 
the KeyTypeDiscoveryVisitor only looks at POPackage and POLocalRearrange to 
figure out the key type. In a pig script which results in multiple Map reduce 
jobs, one of the jobs could have a map plan with only POLoads in it. In such a 
case, the map key type is not discovered and this results in a null being 
returned from HDataType.getWritableComparableTypes() method. This in turn will 
result in a NullPointerException in the collect().

Here is a script which can prompt this behavior:
{code}
a = load 'a.txt' as (x:int, y:int, z:int);
b = load 'b.txt' as (x:int, y:int);
b_group = group b by x;
b_sum = foreach b_group generate flatten(group) as x, SUM(b.y) as clicks;
a_group = group a by (x, y);
a_aggs = foreach a_group {
            generate 
                flatten(group) as (x, y),
                SUM(a.z) as zs;
                };
join_a_b = join b_sum by x, a_aggs by x; --> the map plan for this join will 
only have two POLoads which will result in the NullPointerException at runtime 
in collect()
dump join_a_b;

{code} 

Contents of a.txt (columns are tab separated):
The first column of the first two rows is null (represented by an empty column)
{noformat}
        7       8
        8       9
1       20      30
1       20      40
{noformat}

Contents of b.txt (columns are tab separated):
{noformat}
7       2
1       5
1       10
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-665) Map key type not correctly set (for use when key is null) when map plan does not have localrearrange

Reply via email to