Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten
them)
---------------------------------------------------------------------------------
Key: PIG-169
URL: https://issues.apache.org/jira/browse/PIG-169
Project: Pig
Issue Type: Improvement
Components: data
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Currently PigStorage (actually Tuple.toDelimitedString) only handles the simple
case of straight DataAtoms as fields and borks if it has any other Datum as a
field. It would be nice to enhance it to handle the more complicated cases too.
Currently users _have to_ use a *flatten* to convert these to simpler Tuples
which can be then handled by PigStorage.
----
On a related note, there is an interesting caveat with GROUP/COGROUP
operators... they result in tuples with the first field which has the name
'group', whose value on which the grouping has been performed.
E.g.
Input:
<A, 1>
<A, 2>
Pig script:
INPUT = load 'input';
A = group INPUT by $0;
B = stream A through `script`;
Results in A being:
(A, {(A, 1), (A, 2)})
Now, if PigStorage _auto-flattens_ A it results in:
(A, A, 1)
(A, A, 2)
However, user expectation is probably the straight-forward:
(A, 1)
(A, 2)
---
Alan suggested that we could use the LOVisitor infrastructure to visit nodes in
the tree, save up information (i.e. that a GROUP/COGROUP occured) and then use
that information to get PigStorage to 'skip' the group field while
auto-flattening. However it might have to done if, and only if, PigStorage is
auto-flattening tuples directly coming from a GROUP/COGROUP operator i.e.
doesn't have other EvalSpecs working on those tuples ...
---
Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.