wrong columns produced if incomplete definition provided during load
--------------------------------------------------------------------
Key: PIG-435
URL: https://issues.apache.org/jira/browse/PIG-435
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Fix For: types_branch
Scrip:
A = load 'studenttab10k' as (name); -- note that data has more than 1 column
B = load 'votertab10k' as (name, age, reg, contrib);
D = COGROUP A by name, B by name;
E = foreach D generate flatten(A), flatten(B);
F = foreach E generate registration, contr;
dump F;
The dump produces the wrong columns. This is because even though we declared
only one column, we actually load all columns of A. So any place where we
explicitely or implicitely use A.* as the case in flatten, we would produce the
wrong results.
The long term solution is actually to push projections into the load. Shorter
term the proposal is to notice if the script uses A.* and stick a project after
the load. Note that we don't need to do that if types are declared because
there will be already casting foreach there.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.