On May 6, 2010, at 12:14 AM, Dmitriy Ryaboy wrote:

> Does it surprise you that "select a as foo, b, d" return 3 columns?
> You only gave one alias... this works the same way.
> 
> It's the opposite that surprises me -- that if you load multi-column
> data and only provide names for the first few columns, you can't
> access the rest by ordinal.
> 
> -D
> 

If you have 

X: { a: int, b: int, c: int}

Y = FOREACH X GENERATE a, b;
does not leave 'c' in there as $2.  These aren't exactly the same, but it is 
where the confusion is coming from.

The confusion is that FOREACH ... GENERATE is a projection operation, and in 
the case sited here it does not project and remove unreferenced fields.  
To me, it is not surprising that FLATTEN on a tuple with an alias assignment 
doesn't remove unnamed fields, but it is somewhat surprising that the FOREACH 
... GENERATE wrapping it doesn't.

B1 = FOREACH A GENERATE id, FLATTEN(bad);
B = FOREACH B1 GENERATE id, bad::a as a;

works.

At least in 0.5 the below inconsistently works: ('.' as a tuple dereference 
projection kills combiner optimization, and on occasion fails to run in much 
more complicated scenarios, so I avoid it).
B = FOREACH A GENERATE id, bad.a as a;



The confusion is that FOREACH ... GENERATE is the only supported means of 
projection, but it doesn't always project the fields listed.  In a FOREACH ... 
GENERATE the projection occurs _BEFORE_ alias assignment.




> On Wed, May 5, 2010 at 11:24 PM, hc busy <[email protected]> wrote:
>> okay, I have to blow some steam here, did you know that if
>> 
>> describe A;
>> A: {id: int, bad: (a: int,b: int,z: int)}
>> 
>> and I do
>> 
>> B = foreach A generate id, FLATTEN(bad) as c;
>> 
>> That this would actually run without error and that c takes value of a, and
>> then an anonymous field is created for b. (So, b is not dropped by this
>> cast)
>> 
>> I wonder if either the "B =" statement should generate an error, OR
>> it would rename a to c and drop the column b ?
>> The statement:
>> 
>> B = foreach A generate id, FLATTEN(bad) as (c,d);
>> describe B;
>> B: {id: int,c: int,d:int}
>> 
>> Seems to make more sense than a silent non-dropping result.
>> 

Reply via email to