Map key type not correctly set (for use when key is null) when map plan does
not have localrearrange
----------------------------------------------------------------------------------------------------
Key: PIG-665
URL: https://issues.apache.org/jira/browse/PIG-665
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Fix For: types_branch
KeyTypeDiscoveryVisitor visits the map plan to figure out the datatype of the
map key. This is required so that when the map key is null, we can still
construct a valid NullableXXXWritable object to pass on to hadoop in the
collect() call (hadoop needs a valid object even for null objects). Currently
the KeyTypeDiscoveryVisitor only looks at POPackage and POLocalRearrange to
figure out the key type. In a pig script which results in multiple Map reduce
jobs, one of the jobs could have a map plan with only POLoads in it. In such a
case, the map key type is not discovered and this results in a null being
returned from HDataType.getWritableComparableTypes() method. This in turn will
result in a NullPointerException in the collect().
Here is a script which can prompt this behavior:
{code}
a = load 'a.txt' as (x:int, y:int, z:int);
b = load 'b.txt' as (x:int, y:int);
b_group = group b by x;
b_sum = foreach b_group generate flatten(group) as x, SUM(b.y) as clicks;
a_group = group a by (x, y);
a_aggs = foreach a_group {
generate
flatten(group) as (x, y),
SUM(a.z) as zs;
};
join_a_b = join b_sum by x, a_aggs by x; --> the map plan for this join will
only have two POLoads which will result in the NullPointerException at runtime
in collect()
dump join_a_b;
{code}
Contents of a.txt (columns are tab separated):
The first column of the first two rows is null (represented by an empty column)
{noformat}
7 8
8 9
1 20 30
1 20 40
{noformat}
Contents of b.txt (columns are tab separated):
{noformat}
7 2
1 5
1 10
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.