I guess it's Turing hard problem to generate warnings and errors. Even in
SQL one can do

select author as book_tittle from books;

and get things mixed up...


On Thu, May 6, 2010 at 2:21 PM, Scott Carey <[email protected]> wrote:

>
> On May 6, 2010, at 12:14 AM, Dmitriy Ryaboy wrote:
>
> > Does it surprise you that "select a as foo, b, d" return 3 columns?
> > You only gave one alias... this works the same way.
> >
> > It's the opposite that surprises me -- that if you load multi-column
> > data and only provide names for the first few columns, you can't
> > access the rest by ordinal.
> >
> > -D
> >
>
> If you have
>
> X: { a: int, b: int, c: int}
>
> Y = FOREACH X GENERATE a, b;
> does not leave 'c' in there as $2.  These aren't exactly the same, but it
> is where the confusion is coming from.
>
> The confusion is that FOREACH ... GENERATE is a projection operation, and
> in the case sited here it does not project and remove unreferenced fields.
> To me, it is not surprising that FLATTEN on a tuple with an alias
> assignment doesn't remove unnamed fields, but it is somewhat surprising that
> the FOREACH ... GENERATE wrapping it doesn't.
>
> B1 = FOREACH A GENERATE id, FLATTEN(bad);
> B = FOREACH B1 GENERATE id, bad::a as a;
>
> works.
>
> At least in 0.5 the below inconsistently works: ('.' as a tuple dereference
> projection kills combiner optimization, and on occasion fails to run in much
> more complicated scenarios, so I avoid it).
> B = FOREACH A GENERATE id, bad.a as a;
>
>
>
> The confusion is that FOREACH ... GENERATE is the only supported means of
> projection, but it doesn't always project the fields listed.  In a FOREACH
> ... GENERATE the projection occurs _BEFORE_ alias assignment.
>
>
>
>
> > On Wed, May 5, 2010 at 11:24 PM, hc busy <[email protected]> wrote:
> >> okay, I have to blow some steam here, did you know that if
> >>
> >> describe A;
> >> A: {id: int, bad: (a: int,b: int,z: int)}
> >>
> >> and I do
> >>
> >> B = foreach A generate id, FLATTEN(bad) as c;
> >>
> >> That this would actually run without error and that c takes value of a,
> and
> >> then an anonymous field is created for b. (So, b is not dropped by this
> >> cast)
> >>
> >> I wonder if either the "B =" statement should generate an error, OR
> >> it would rename a to c and drop the column b ?
> >> The statement:
> >>
> >> B = foreach A generate id, FLATTEN(bad) as (c,d);
> >> describe B;
> >> B: {id: int,c: int,d:int}
> >>
> >> Seems to make more sense than a silent non-dropping result.
> >>
>
>

Reply via email to