[jira] Commented: (PIG-169) Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten them)

Pi Song (JIRA) Wed, 26 Mar 2008 15:17:34 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582446#action_12582446
 ]


Pi Song commented on PIG-169:
-----------------------------

I feel this is not something to be fixed at the Storage class. 
1. If the user understands Pig data model, he should be able to strip the group 
field off himself. The user will be able to see why his command doesn't working 
in the first place by using lineage tracing that somebody is working on.
2. I just wanna understand a bit more about this user case. GROUP is done by 
grouping data in nested bags and tag them with group labels. If you don't want 
the label, why do you do grouping?

;)

> Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten 
> them)
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-169
>                 URL: https://issues.apache.org/jira/browse/PIG-169
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> Currently PigStorage (actually Tuple.toDelimitedString) only handles the 
> simple case of straight DataAtoms as fields and borks if it has any other 
> Datum as a field. It would be nice to enhance it to handle the more 
> complicated cases too. Currently users _have to_ use a *flatten* to convert 
> these to simpler Tuples which can be then handled by PigStorage.
> ----
> On a related note, there is an interesting caveat with GROUP/COGROUP 
> operators... they result in tuples with the first field which has the name 
> 'group', whose value on which the grouping has been performed. 
> E.g.
> Input:
>  <A, 1>
>  <A, 2>
> Pig script:
>  INPUT = load 'input';
>  A = group INPUT by $0;
>  B = stream A through `script`;
> Results in A being: 
> (A, {(A, 1), (A, 2)})
> Now, if PigStorage _auto-flattens_ A it results in:
>  (A, A, 1)
>  (A, A, 2)
> However, user expectation is probably the straight-forward:
>  (A, 1)
>  (A, 2)
> ---
> Alan suggested that we could use the LOVisitor infrastructure to visit nodes 
> in the tree, save up information (i.e. that a GROUP/COGROUP occured) and then 
> use that information to get PigStorage to 'skip' the group field while 
> auto-flattening. However it might have to done if, and only if, PigStorage is 
> auto-flattening tuples directly coming from a GROUP/COGROUP operator i.e. 
> doesn't have other EvalSpecs working on those tuples ...
> ---
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-169) Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten them)

Reply via email to