I prefer (I) and that means I want to allow non-reserved keywords. On Fri, Jun 6, 2008 at 9:33 AM, pi song <[EMAIL PROTECTED]> wrote:
> I know it is very subjective to say I don't agree with "1) It's > confusing". On developers' side, it is. But on users' side, it might not. > > Some languages allow usage of keywords given they are used in the right > context. The current Pig implementation also allows referring to "group" as > an alias. > > Before we jump to the solution, shouldn't it be better to make our > position clear on "Do we want every keyword to be reserved word regardless > of context?" > > Pi > > > On 6/6/08, Chris Olston <[EMAIL PROTECTED]> wrote: >> >> I vote for (III) -- propagate the alias. This makes the scripts very >> natural and readable, e.g.: >> >> a = group pages by host; >> b = foreach a generate host, count(pages); >> >> As for what to do in the case of grouping on multiple fields, or co-group >> on differently-named fields, we should *not* assign a default name -- the >> user can choose a name using "AS". >> >> -Chris >> >> >> On Jun 5, 2008, at 9:10 AM, Alan Gates wrote: >> >> Currently in Pig Latin, anytime a (CO)GROUP statement is used, the field >>> (or set of fields) that are grouped on are given the alias 'group'. This >>> has a couple of issues: >>> >>> 1) It's confusing. 'group' is now a keyword and an alias. >>> 2) We don't currently allow 'group' as an alias in an AS. It is strange >>> to have an alias that can only be assigned by the language and never by the >>> user. >>> >>> Possible solutions: >>> >>> I) Status quo. We could fix it so that group is allowed to be assigned >>> as an alias in AS. >>> >>> Pros: Backward compatibility >>> Cons: a) will make the parser more complicated >>> b) see 1) above. >>> >>> >>> II) Don't give an implicit alias to the group key(s). If users want an >>> alias, they can assign it using AS. >>> >>> Pros: Simplicity >>> Cons: We do assign aliases to grouped bags. That is, if we have C = >>> GROUP B by $0 the resulting schema of C is (group, B). So if we don't >>> assign an alias to the group key, we now have a schema ($0, B). This seems >>> strange. And worse yet, if users want to alias the group key(s), they'll be >>> forced to alias all the grouped bags as well. >>> >>> III) Carry the alias (if any) that the field had before. So if we had a >>> script like: >>> >>> A = load 'myfile' as (x, y, z); >>> B = group A by x; >>> >>> The the schema of B would be (x, A). This is quite natural for grouping >>> of single columns. But it turns nasty when you group on multiple columns. >>> Do we then append the names to together? So if you have >>> >>> B = group A by x, y; >>> >>> is the resulting schema (x_y, A)? Ugh. >>> >>> In this case there is also the question of what to do in the case of >>> cogroups, where the key may be named differently in different relations. >>> >>> A = load 'myfile' as (x, y, z); >>> B = load 'myotherfile' as (t, u, v); >>> C = cogroup A by x, B by t; >>> >>> Is the resulting schema (x, A, B) or (t, A, B) or are both valid? This >>> could be resolved by either saying first one always wins, or allowing >>> either. >>> >>> Pros: Very natural for the users, their fields maintain names through >>> the query. >>> Cons: Quickly gets burdensome in the case of multi-key groups. >>> >>> IV) Assign a non-keyword alias to the group key, like grp or groupkey or >>> grpkey (or some other suitable choice). >>> Pros: Least disruptive change. Users only have to go through their >>> scripts and find places where they use the group alias and change it to grp >>> (or whatever). >>> Cons: Still leaves us with a situation where we are assigning a name to >>> a field arbtrarily, leaving users confused as to how their fields got named >>> that. >>> >>> V) Remove GROUP as a keyword. It is just short for COGROUP of one >>> relation anyway. >>> >>> Pros: Smaller syntax in a language is always good. >>> Cons: Will break a lot of scripts, and confuse a lot of users who only >>> think in terms of GROUP and JOIN and never use COGROUP explicitly. >>> >>> One could also conceive of combinations of these. For example, we always >>> assign a name like grpkey to the group key(s), and in the single key case we >>> also carry forward the alias that the field already had, if any. >>> >>> Thoughts? Other possibilities? >>> >>> Alan. >>> >> >> -- >> Christopher Olston, Ph.D. >> Sr. Research Scientist >> Yahoo! Research >> >> >> >
