Cross products of two flattens is incorrectly including records with null values
--------------------------------------------------------------------------------
Key: PIG-302
URL: https://issues.apache.org/jira/browse/PIG-302
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Alan Gates
Assignee: Alan Gates
Given two data sets:
studenttab10
alex garcia,39,3.81
bob jones,40,2.77
zach johnson,23,4.00
tony mendleson,87,2.10
todd wellington,55,3.32
melany smith,19,3.98
jane wesley,62,1.98
irene chan,34,3.14
laverne shirley,58,2.43
marcia tently,32,3.48
and
alex garcia,39,republican,1.50
bob jones,40,democrat,1000.30
zach johnson,23,independent,0.00
tony mendleson,87,socialist,101012.92
todd wellington,55,green,99.89
melany smith,29,republican,88787.29
john wesley,62,democrat,0.89
bob smith,18,independent,0.99
johnny appleseed,234,green,99.95
barak obama,47,democrat,3.48
and the script:
a = load '/Users/gates/test/data/studenttab10' using PigStorage(',') as (name,
age, gpa);
b = load '/Users/gates/test/data/votertab10' using PigStorage(',') as (name,
age, registration, contributions);
c = filter a by age < 40;
d = filter b by age < 40;
e = cogroup c by name, d by name;
f = foreach e generate flatten (c), flatten(d);
dump f;
The result is:
(NULL, bob smith, 18, independent, 0.99)
(alex garcia, 39, 3.81, alex garcia, 39, republican, 1.50)
(melany smith, 19, 3.98, melany smith, 29, republican, 88787.29)
(zach johnson, 23, 4.00, zach johnson, 23, independent, 0.00)
The first record should not be there. Flatten is supposed to remove records
without a match.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.