On May 6, 2010, at 12:14 AM, Dmitriy Ryaboy wrote:
> Does it surprise you that "select a as foo, b, d" return 3 columns?
> You only gave one alias... this works the same way.
>
> It's the opposite that surprises me -- that if you load multi-column
> data and only provide names for the first few columns, you can't
> access the rest by ordinal.
>
> -D
>
If you have
X: { a: int, b: int, c: int}
Y = FOREACH X GENERATE a, b;
does not leave 'c' in there as $2. These aren't exactly the same, but it is
where the confusion is coming from.
The confusion is that FOREACH ... GENERATE is a projection operation, and in
the case sited here it does not project and remove unreferenced fields.
To me, it is not surprising that FLATTEN on a tuple with an alias assignment
doesn't remove unnamed fields, but it is somewhat surprising that the FOREACH
... GENERATE wrapping it doesn't.
B1 = FOREACH A GENERATE id, FLATTEN(bad);
B = FOREACH B1 GENERATE id, bad::a as a;
works.
At least in 0.5 the below inconsistently works: ('.' as a tuple dereference
projection kills combiner optimization, and on occasion fails to run in much
more complicated scenarios, so I avoid it).
B = FOREACH A GENERATE id, bad.a as a;
The confusion is that FOREACH ... GENERATE is the only supported means of
projection, but it doesn't always project the fields listed. In a FOREACH ...
GENERATE the projection occurs _BEFORE_ alias assignment.
> On Wed, May 5, 2010 at 11:24 PM, hc busy <[email protected]> wrote:
>> okay, I have to blow some steam here, did you know that if
>>
>> describe A;
>> A: {id: int, bad: (a: int,b: int,z: int)}
>>
>> and I do
>>
>> B = foreach A generate id, FLATTEN(bad) as c;
>>
>> That this would actually run without error and that c takes value of a, and
>> then an anonymous field is created for b. (So, b is not dropped by this
>> cast)
>>
>> I wonder if either the "B =" statement should generate an error, OR
>> it would rename a to c and drop the column b ?
>> The statement:
>>
>> B = foreach A generate id, FLATTEN(bad) as (c,d);
>> describe B;
>> B: {id: int,c: int,d:int}
>>
>> Seems to make more sense than a silent non-dropping result.
>>