I am fine with 3. Can we in addition allow access by position as well as let users assign their own names?
Olga > -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED] > Sent: Friday, June 13, 2008 7:26 AM > To: [email protected] > Subject: Re: Issues with group as an alias > > All, > > I too will vote for III, with the caveat that we don't give > names to multi-field grouping keys. We need to make sure we > support AS to allow the user to name their grouping keys if they want. > > So far, the vote totals are: > I: 1 > II: 0 > III: 3 > IV: 0 > V: 0 > > I'd like to make a decision and move forward by mid next > week. If you haven't voted and you'd like to, please do so > now. If you feel passionately about one of the options that > is loosing, please make your arguments now. > > Alan. > > Alan Gates wrote: > > Currently in Pig Latin, anytime a (CO)GROUP statement is used, the > > field (or set of fields) that are grouped on are given the alias > > 'group'. This has a couple of issues: > > > > 1) It's confusing. 'group' is now a keyword and an alias. > > 2) We don't currently allow 'group' as an alias in an AS. It is > > strange to have an alias that can only be assigned by the > language and > > never by the user. > > > > Possible solutions: > > > > I) Status quo. We could fix it so that group is allowed to be > > assigned as an alias in AS. > > > > Pros: Backward compatibility > > Cons: a) will make the parser more complicated > > b) see 1) above. > > > > > > II) Don't give an implicit alias to the group key(s). If > users want > > an alias, they can assign it using AS. > > > > Pros: Simplicity > > Cons: We do assign aliases to grouped bags. That is, if > we have C = > > GROUP B by $0 the resulting schema of C is (group, B). So > if we don't > > assign an alias to the group key, we now have a schema ($0, > B). This > > seems strange. And worse yet, if users want to alias the group > > key(s), they'll be forced to alias all the grouped bags as well. > > > > III) Carry the alias (if any) that the field had before. > So if we had > > a script like: > > > > A = load 'myfile' as (x, y, z); > > B = group A by x; > > > > The the schema of B would be (x, A). This is quite natural for > > grouping of single columns. But it turns nasty when you group on > > multiple columns. Do we then append the names to together? > So if you > > have > > > > B = group A by x, y; > > > > is the resulting schema (x_y, A)? Ugh. > > > > In this case there is also the question of what to do in > the case of > > cogroups, where the key may be named differently in > different relations. > > > > A = load 'myfile' as (x, y, z); > > B = load 'myotherfile' as (t, u, v); > > C = cogroup A by x, B by t; > > > > Is the resulting schema (x, A, B) or (t, A, B) or are both valid? > > This could be resolved by either saying first one always wins, or > > allowing either. > > > > Pros: Very natural for the users, their fields maintain > names through > > the query. > > Cons: Quickly gets burdensome in the case of multi-key groups. > > > > IV) Assign a non-keyword alias to the group key, like grp > or groupkey > > or grpkey (or some other suitable choice). > > Pros: Least disruptive change. Users only have to go > through their > > scripts and find places where they use the group alias and > change it > > to grp (or whatever). > > Cons: Still leaves us with a situation where we are > assigning a name > > to a field arbtrarily, leaving users confused as to how > their fields > > got named that. > > > > V) Remove GROUP as a keyword. It is just short for COGROUP of one > > relation anyway. > > > > Pros: Smaller syntax in a language is always good. > > Cons: Will break a lot of scripts, and confuse a lot of users who > > only think in terms of GROUP and JOIN and never use COGROUP > explicitly. > > > > One could also conceive of combinations of these. For example, we > > always assign a name like grpkey to the group key(s), and in the > > single key case we also carry forward the alias that the > field already > > had, if any. > > > > Thoughts? Other possibilities? > > > > Alan. >
