No more thinking on this one?
I think it's quite important though.

On 5/28/08, pi song <[EMAIL PROTECTED]> wrote:
>
> 1) If we want to allow bags of anything in the future, there are things
> like this:-
>
> A = FOREACH B GENERATE B.$0, B.$1   ;
>
> which is already accessing tuple content directly. I'm not sure if there
> are also others.
>
> 2) For the perl bit, I think "optionally" omitting bag keyword is not a
> problem. We already have different brackets meaning different things: ( ) ,
> {   } , [   ]. That's why I think forcing TUPLE, BAG, MAP is redundant.
>
> Pi
>
>  On 5/28/08, Alan Gates <[EMAIL PROTECTED]> wrote:
>>
>> A couple of thoughts:
>>
>> The issue with removing the tuple keyword from bag definition, so we can
>> have bag: {a: int} instead of bag: {tuple: (a: int)}, is we had discussed
>> allowing bags to be bags of anything, instead of bags of tuples.  We aren't
>> doing anything about that now, but we might in the future.  We would have to
>> change the semantics on bag type declaration if we made that change.
>>  Otherwise we would not know whether bag {a: int} meant that we had a bag of
>> tuples of one element or a bag of ints.
>>
>> As for letting {} alone mean bag, I'm concerned pig latin will end up like
>> perl, where different brackets mean different things and it's hard to read
>> the code.  The other extreme is ending up like sql where it takes way too
>> many keywords to do something.  I'm open to others views on this.
>>
>> Alan.
>>
>> pi song wrote:
>>
>>> Here is what I know:-
>>>
>>> Tuple Schema = schema associated with "a" tuple
>>> Bag Schema = schema of all tuples contained in a bag
>>>
>>> Then, here is the current way to specify schema in PigType branch:-
>>>
>>> A = LOAD 'file1' AS (fieldA: bag
>>> {tuple1:tuple(a:int,b:long,c:float,d:double)}, fieldB: Int)
>>>
>>> Isn't this inefficient? Since we have already agreed that a bag only
>>> contains tuples, not datum, I think it would be better if users can do
>>> just:-
>>>
>>> A = LOAD 'file1' AS (fieldA: bag {a:int,b:long,c:float,d:double}, fieldB:
>>> Int)
>>>
>>> Or even better, due to the fact that the curly braces already indicate
>>> Bag
>>> data type:-
>>>
>>> A = LOAD 'file1' AS (fieldA: {a:int,b:long,c:float,d:double}, fieldB:
>>> Int)
>>>
>>> So potentially I think the keyword "Bag" should be optional for
>>> convenience.
>>> This is the same as when we specify tuple schema which is already
>>> indicated
>>> by round brackets.
>>>
>>> Any opinion? It's now time to make it easy for users.
>>>
>>> Pi
>>>
>>> PS. I'm willing to make the change if everybody is too busy.
>>>
>>>
>>>
>>
>

Reply via email to