[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888449#action_12888449 ]
Alan Gates commented on PIG-1434: --------------------------------- Alright, I finally understand. I think the potential confusion for the user and the Pig parser is caused by the proposed way to handle multi-columned input. Rather than {code} Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max; {code} if we instead do {code} Y = foreach Z generate X::$1/((tuple)C).count, X::$2 - ((tuple)C).max; {code} then I believe it is clear for both user and parser what is happening. In each case C is being cast to a tuple and then fields read out of it. C is not being cast to a long. Then the feature remains basically as originally proposed. The relation being cast must have one record and one field. That one field can be a tuple to handle the case where the record has multiple fields. But Pig will still reads it as a single column which is a tuple, and the user will need to cast it accordingly. This should also avoid accidental usage. In the example above: {code} Y = foreach Z generate X::$1/C.count, X::$2 - C.max; {code} should still be an error because the type checker should not be able to find C as a tuple anywhere in its symbol table. > Allow casting relations to scalars > ---------------------------------- > > Key: PIG-1434 > URL: https://issues.apache.org/jira/browse/PIG-1434 > Project: Pig > Issue Type: Improvement > Reporter: Olga Natkovich > Assignee: Aniket Mokashi > Fix For: 0.8.0 > > Attachments: scalarImpl.patch > > > This jira is to implement a simplified version of the functionality described > in https://issues.apache.org/jira/browse/PIG-801. > The proposal is to allow casting relations to scalar types in foreach. > Example: > A = load 'data' as (x, y, z); > B = group A all; > C = foreach B generate COUNT(A); > ..... > X = .... > Y = foreach X generate $1/(long) C; > Couple of additional comments: > (1) You can only cast relations including a single value or an error will be > reported > (2) Name resolution is needed since relation X might have field named C in > which case that field takes precedence. > (3) Y will look for C closest to it. > Implementation thoughts: > The idea is to store C into a file and then convert it into scalar via a UDF. > I believe we already have a UDF that Ben Reed contributed for this purpose. > Most of the work would be to update the logical plan to > (1) Store C > (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.