III has open questions. Can we make a concrete proposal for III?

thanx
ben

On Friday 13 June 2008 22:24:01 Ted Dunning wrote:
> I think that I am convinced III is best.
>
> On Fri, Jun 13, 2008 at 7:26 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
> > All,
> >
> > I too will vote for III, with the caveat that we don't give names to
> > multi-field grouping keys.  We need to make sure we support AS to allow
> > the user to name their grouping keys if they want.
> >
> > So far, the vote totals are:
> > I: 1
> > II: 0
> > III: 3
> > IV: 0
> > V: 0
> >
> > I'd like to make a decision and move forward by mid next week.  If you
> > haven't voted and you'd like to, please do so now.  If you feel
> > passionately about one of the options that is loosing, please make your
> > arguments now.
> >
> > Alan.
> >
> > Alan Gates wrote:
> >> Currently in Pig Latin, anytime a (CO)GROUP statement is used, the field
> >> (or set of fields) that are grouped on are given the alias 'group'. 
> >> This has a couple of issues:
> >>
> >> 1)  It's confusing.  'group' is now a keyword and an alias.
> >> 2)  We don't currently allow 'group' as an alias in an AS.  It is
> >> strange to have an alias that can only be assigned by the language and
> >> never by the user.
> >>
> >> Possible solutions:
> >>
> >> I) Status quo.  We could fix it so that group is allowed to be assigned
> >> as an alias in AS.
> >>
> >> Pros:  Backward compatibility
> >> Cons: a) will make the parser more complicated
> >>     b) see 1) above.
> >>
> >>
> >> II) Don't give an implicit alias to the group key(s).  If users want an
> >> alias, they can assign it using AS.
> >>
> >> Pros:  Simplicity
> >> Cons:  We do assign aliases to grouped bags.  That is, if we have C =
> >> GROUP B by $0 the resulting schema of C is (group, B).  So if we don't
> >> assign an alias to the group key, we now have a schema ($0, B).  This
> >> seems strange.  And worse yet, if users want to alias the group key(s),
> >> they'll be forced to alias all the grouped bags as well.
> >>
> >> III) Carry the alias (if any) that the field had before.  So if we had a
> >> script like:
> >>
> >> A = load 'myfile' as (x, y, z);
> >> B = group A by x;
> >>
> >> The the schema of B would be (x, A).  This is quite natural for grouping
> >> of single columns.  But it turns nasty when you group on multiple
> >> columns. Do we then append the names to together?  So if you have
> >>
> >> B = group A by x, y;
> >>
> >> is the resulting schema (x_y, A)?  Ugh.
> >>
> >> In this case there is also the question of what to do in the case of
> >> cogroups, where the key may be named differently in different relations.
> >>
> >> A = load 'myfile' as (x, y, z);
> >> B = load 'myotherfile' as (t, u, v);
> >> C = cogroup A by x, B by t;
> >>
> >> Is the resulting schema (x, A, B) or (t, A, B) or are both valid?  This
> >> could be resolved by either saying first one always wins, or allowing
> >> either.
> >>
> >> Pros:  Very natural for the users, their fields maintain names through
> >> the query.
> >> Cons:  Quickly gets burdensome in the case of multi-key groups.
> >>
> >> IV) Assign a non-keyword alias to the group key, like grp or groupkey or
> >> grpkey (or some other suitable choice).
> >> Pros:  Least disruptive change.  Users only have to go through their
> >> scripts and find places where they use the group alias and change it to
> >> grp (or whatever).
> >> Cons:  Still leaves us with a situation where we are assigning a name to
> >> a field arbtrarily, leaving users confused as to how their fields got
> >> named that.
> >>
> >> V) Remove GROUP as a keyword.  It is just short for COGROUP of one
> >> relation anyway.
> >>
> >> Pros:  Smaller syntax in a language is always good.
> >> Cons:  Will break a lot of scripts, and confuse a lot of users who only
> >> think in terms of GROUP and JOIN and never use COGROUP explicitly.
> >>
> >> One could also conceive of combinations of these.  For example, we
> >> always assign a name like grpkey to the group key(s), and in the single
> >> key case we also carry forward the alias that the field already had, if
> >> any.
> >>
> >> Thoughts?  Other possibilities?
> >>
> >> Alan.


Reply via email to