I guess it's Turing hard problem to generate warnings and errors. Even in SQL one can do
select author as book_tittle from books; and get things mixed up... On Thu, May 6, 2010 at 2:21 PM, Scott Carey <[email protected]> wrote: > > On May 6, 2010, at 12:14 AM, Dmitriy Ryaboy wrote: > > > Does it surprise you that "select a as foo, b, d" return 3 columns? > > You only gave one alias... this works the same way. > > > > It's the opposite that surprises me -- that if you load multi-column > > data and only provide names for the first few columns, you can't > > access the rest by ordinal. > > > > -D > > > > If you have > > X: { a: int, b: int, c: int} > > Y = FOREACH X GENERATE a, b; > does not leave 'c' in there as $2. These aren't exactly the same, but it > is where the confusion is coming from. > > The confusion is that FOREACH ... GENERATE is a projection operation, and > in the case sited here it does not project and remove unreferenced fields. > To me, it is not surprising that FLATTEN on a tuple with an alias > assignment doesn't remove unnamed fields, but it is somewhat surprising that > the FOREACH ... GENERATE wrapping it doesn't. > > B1 = FOREACH A GENERATE id, FLATTEN(bad); > B = FOREACH B1 GENERATE id, bad::a as a; > > works. > > At least in 0.5 the below inconsistently works: ('.' as a tuple dereference > projection kills combiner optimization, and on occasion fails to run in much > more complicated scenarios, so I avoid it). > B = FOREACH A GENERATE id, bad.a as a; > > > > The confusion is that FOREACH ... GENERATE is the only supported means of > projection, but it doesn't always project the fields listed. In a FOREACH > ... GENERATE the projection occurs _BEFORE_ alias assignment. > > > > > > On Wed, May 5, 2010 at 11:24 PM, hc busy <[email protected]> wrote: > >> okay, I have to blow some steam here, did you know that if > >> > >> describe A; > >> A: {id: int, bad: (a: int,b: int,z: int)} > >> > >> and I do > >> > >> B = foreach A generate id, FLATTEN(bad) as c; > >> > >> That this would actually run without error and that c takes value of a, > and > >> then an anonymous field is created for b. (So, b is not dropped by this > >> cast) > >> > >> I wonder if either the "B =" statement should generate an error, OR > >> it would rename a to c and drop the column b ? > >> The statement: > >> > >> B = foreach A generate id, FLATTEN(bad) as (c,d); > >> describe B; > >> B: {id: int,c: int,d:int} > >> > >> Seems to make more sense than a silent non-dropping result. > >> > >
