It doesn't surprise me, but the fact that it doesn't scream an error or a
very loud warning is annoying. consider this sequence of changes
timestamp 1:
describe A;
A: {id: int, bad: (a: int,b: int)}
B = foreach A generate id, FLATTEN(bad) as (a, b);
timestamp 2:
describe A;
A: {id: int, bad: (a: int,b: int, c: chararray)}
B = foreach A generate id, FLATTEN(bad) as (a, b);
timestamp 3:
describe A;
A: {id: int, bad: (a: int,b: int, c: chararray,d: int)}
B = foreach A generate id, FLATTEN(bad) as (a, b, c);
Migraine ensues as multiple developers scramble to try to figure out why the
script didn't work after their seemingly harmless change.
On Thu, May 6, 2010 at 12:14 AM, Dmitriy Ryaboy <[email protected]> wrote:
> Does it surprise you that "select a as foo, b, d" return 3 columns?
> You only gave one alias... this works the same way.
>
> It's the opposite that surprises me -- that if you load multi-column
> data and only provide names for the first few columns, you can't
> access the rest by ordinal.
>
> -D
>
> On Wed, May 5, 2010 at 11:24 PM, hc busy <[email protected]> wrote:
> > okay, I have to blow some steam here, did you know that if
> >
> > describe A;
> > A: {id: int, bad: (a: int,b: int,z: int)}
> >
> > and I do
> >
> > B = foreach A generate id, FLATTEN(bad) as c;
> >
> > That this would actually run without error and that c takes value of a,
> and
> > then an anonymous field is created for b. (So, b is not dropped by this
> > cast)
> >
> > I wonder if either the "B =" statement should generate an error, OR
> > it would rename a to c and drop the column b ?
> > The statement:
> >
> > B = foreach A generate id, FLATTEN(bad) as (c,d);
> > describe B;
> > B: {id: int,c: int,d:int}
> >
> > Seems to make more sense than a silent non-dropping result.
> >
>