I prefer (I) and that means I want to allow non-reserved keywords.

On Fri, Jun 6, 2008 at 9:33 AM, pi song <[EMAIL PROTECTED]> wrote:

> I know it is very subjective to say I don't agree with "1)  It's
> confusing". On developers' side, it is. But on users' side, it might not.
>
> Some languages allow usage of keywords given they are used in the right
> context. The current Pig implementation also allows referring to "group" as
> an alias.
>
> Before we jump to the solution, shouldn't it be better to make our
> position clear on "Do we want every keyword to be reserved word regardless
> of context?"
>
> Pi
>
>
> On 6/6/08, Chris Olston <[EMAIL PROTECTED]> wrote:
>>
>> I vote for (III) -- propagate the alias. This makes the scripts very
>> natural and readable, e.g.:
>>
>> a = group pages by host;
>> b = foreach a generate host, count(pages);
>>
>> As for what to do in the case of grouping on multiple fields, or co-group
>> on differently-named fields, we should *not* assign a default name -- the
>> user can choose a name using "AS".
>>
>> -Chris
>>
>>
>> On Jun 5, 2008, at 9:10 AM, Alan Gates wrote:
>>
>> Currently in Pig Latin, anytime a (CO)GROUP statement is used, the field
>>> (or set of fields) that are grouped on are given the alias 'group'.  This
>>> has a couple of issues:
>>>
>>> 1)  It's confusing.  'group' is now a keyword and an alias.
>>> 2)  We don't currently allow 'group' as an alias in an AS.  It is strange
>>> to have an alias that can only be assigned by the language and never by the
>>> user.
>>>
>>> Possible solutions:
>>>
>>> I) Status quo.  We could fix it so that group is allowed to be assigned
>>> as an alias in AS.
>>>
>>> Pros:  Backward compatibility
>>> Cons: a) will make the parser more complicated
>>>     b) see 1) above.
>>>
>>>
>>> II) Don't give an implicit alias to the group key(s).  If users want an
>>> alias, they can assign it using AS.
>>>
>>> Pros:  Simplicity
>>> Cons:  We do assign aliases to grouped bags.  That is, if we have C =
>>> GROUP B by $0 the resulting schema of C is (group, B).  So if we don't
>>> assign an alias to the group key, we now have a schema ($0, B).  This seems
>>> strange.  And worse yet, if users want to alias the group key(s), they'll be
>>> forced to alias all the grouped bags as well.
>>>
>>> III) Carry the alias (if any) that the field had before.  So if we had a
>>> script like:
>>>
>>> A = load 'myfile' as (x, y, z);
>>> B = group A by x;
>>>
>>> The the schema of B would be (x, A).  This is quite natural for grouping
>>> of single columns.  But it turns nasty when you group on multiple columns.
>>>  Do we then append the names to together?  So if you have
>>>
>>> B = group A by x, y;
>>>
>>> is the resulting schema (x_y, A)?  Ugh.
>>>
>>> In this case there is also the question of what to do in the case of
>>> cogroups, where the key may be named differently in different relations.
>>>
>>> A = load 'myfile' as (x, y, z);
>>> B = load 'myotherfile' as (t, u, v);
>>> C = cogroup A by x, B by t;
>>>
>>> Is the resulting schema (x, A, B) or (t, A, B) or are both valid?  This
>>> could be resolved by either saying first one always wins, or allowing
>>> either.
>>>
>>> Pros:  Very natural for the users, their fields maintain names through
>>> the query.
>>> Cons:  Quickly gets burdensome in the case of multi-key groups.
>>>
>>> IV) Assign a non-keyword alias to the group key, like grp or groupkey or
>>> grpkey (or some other suitable choice).
>>> Pros:  Least disruptive change.  Users only have to go through their
>>> scripts and find places where they use the group alias and change it to grp
>>> (or whatever).
>>> Cons:  Still leaves us with a situation where we are assigning a name to
>>> a field arbtrarily, leaving users confused as to how their fields got named
>>> that.
>>>
>>> V) Remove GROUP as a keyword.  It is just short for COGROUP of one
>>> relation anyway.
>>>
>>> Pros:  Smaller syntax in a language is always good.
>>> Cons:  Will break a lot of scripts, and confuse a lot of users who only
>>> think in terms of GROUP and JOIN and never use COGROUP explicitly.
>>>
>>> One could also conceive of combinations of these.  For example, we always
>>> assign a name like grpkey to the group key(s), and in the single key case we
>>> also carry forward the alias that the field already had, if any.
>>>
>>> Thoughts?  Other possibilities?
>>>
>>> Alan.
>>>
>>
>> --
>> Christopher Olston, Ph.D.
>> Sr. Research Scientist
>> Yahoo! Research
>>
>>
>>
>

Reply via email to