[ https://issues.apache.org/jira/browse/PIG-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913115#action_12913115 ]
Alan Gates commented on PIG-1634: --------------------------------- In Pig's semantics c.group, c.foo, and c.bar are all separate columns, and only the first one is $0. Because the bags from the cogroup contain all columns in the row (not just non-key columns) foo is in a and bar in b. Changing something like this would be a radical shift of Pig semantics. > Multiple names for the "group" field > ------------------------------------ > > Key: PIG-1634 > URL: https://issues.apache.org/jira/browse/PIG-1634 > Project: Pig > Issue Type: New Feature > Affects Versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0 > Reporter: Viraj Bhat > > I am hoping that in Pig if I type > {quote} c = cogroup a by foo, b by bar", the fields c.group, c.foo and c.bar > should all map to c.$0 {quote} > This would improve the readability of the Pig script. > Here's a real usecase: > {code} > --- > pages = LOAD 'pages.dat' AS (url, pagerank); > visits = LOAD 'user_log.dat' AS (user_id, url); > page_visits = COGROUP pages BY url, visits BY url; > frequent_visits = FILTER page_visits BY COUNT(visits) >= 2; > answer = FOREACH frequent_visits GENERATE url, FLATTEN(pages.pagerank); > --- > {code} > (The important part is the final GENERATE statement, which references the > field "url", which was the grouping field in the earlier COGROUP.) To get it > to work I have to write it in a less intuitive way. > Maybe with the new parser changes in Pig 0.9 it would be easier to specify > that. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.