Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten 
them)
---------------------------------------------------------------------------------

                 Key: PIG-169
                 URL: https://issues.apache.org/jira/browse/PIG-169
             Project: Pig
          Issue Type: Improvement
          Components: data
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy


Currently PigStorage (actually Tuple.toDelimitedString) only handles the simple 
case of straight DataAtoms as fields and borks if it has any other Datum as a 
field. It would be nice to enhance it to handle the more complicated cases too. 
Currently users _have to_ use a *flatten* to convert these to simpler Tuples 
which can be then handled by PigStorage.

----

On a related note, there is an interesting caveat with GROUP/COGROUP 
operators... they result in tuples with the first field which has the name 
'group', whose value on which the grouping has been performed. 

E.g.

Input:
 <A, 1>
 <A, 2>

Pig script:
 INPUT = load 'input';
 A = group INPUT by $0;
 B = stream A through `script`;

Results in A being: 
(A, {(A, 1), (A, 2)})

Now, if PigStorage _auto-flattens_ A it results in:
 (A, A, 1)
 (A, A, 2)

However, user expectation is probably the straight-forward:
 (A, 1)
 (A, 2)

---

Alan suggested that we could use the LOVisitor infrastructure to visit nodes in 
the tree, save up information (i.e. that a GROUP/COGROUP occured) and then use 
that information to get PigStorage to 'skip' the group field while 
auto-flattening. However it might have to done if, and only if, PigStorage is 
auto-flattening tuples directly coming from a GROUP/COGROUP operator i.e. 
doesn't have other EvalSpecs working on those tuples ...

---

Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to