Multiple names for the "group" field
------------------------------------

                 Key: PIG-1634
                 URL: https://issues.apache.org/jira/browse/PIG-1634
             Project: Pig
          Issue Type: New Feature
    Affects Versions: 0.7.0, 0.6.0, 0.5.0, 0.4.0, 0.3.0, 0.2.0, 0.1.0
            Reporter: Viraj Bhat


I am hoping that in Pig if I type 

{quote} c = cogroup a by foo, b by bar", the fields c.group, c.foo  and c.bar 
should all map to c.$0 {quote} 

This would improve the readability  of the Pig script.

Here's a real usecase:
{code}
---
pages = LOAD 'pages.dat'  AS (url, pagerank);

visits = LOAD 'user_log.dat'  AS (user_id, url);

page_visits = COGROUP pages BY url, visits BY url;

frequent_visits = FILTER page_visits BY COUNT(visits) >= 2;

answer = FOREACH frequent_visits  GENERATE url, FLATTEN(pages.pagerank);
---
{code}

(The important part is the final GENERATE statement, which references   the 
field "url", which was the grouping field in the earlier COGROUP.)  To get it  
to work I have to write it in a less intuitive way.

Maybe with the new parser changes in Pig 0.9 it would be easier to specify that.
Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to