2011/7/19 勇胡 <[email protected]>

> How can I understand that 'A.score' is a bag? I mean that if I issue a
> 'describe B' command, I can get B: {group:int, A: {name:chararray,
> no:int,score:int}}. From here, I can't get any information that 'A.score'
> is
> a bag, but I can see that A.score is an element of bag.
>

Because A is a a bag and A.score is a projection of A on the score field,
which is of course still a bag.


> And why if I delete the quantifier 'A.', it works?
>
>
Because it is the correct way to do.
"Filter relation by field" is the correct syntax.


> I just changed my pig code as
>
> A = LOAD '/home/huyong/test/student.txt' AS (name:chararray, no:int, score:
> int);
> B = GROUP A BY no;
> C =  FOREACH B {
>     D = FILTER A BY score > 80;
>     GENERATE D.name, D.score;}
> DUMP C;
>
> I got an empty bag!
>
> The input is as:
> henrietta       1       25
> sally   1       82
> fred    3       120
> elsie   4       45
>
> The output is as:
> ({(sally)},{(82)})
> ({(fred)},{(120)})
> ({},{})
>
> As you see, I got an empty tuple? why?
>
>
Because you are performing the filter inside a foreach on a group by no, and
no has 3 different values (1,3,4).
On one of the 3 values (namely 4) the filter returns an empty bag (45 < 80)
so you get an empty tuple.


> Yong
>
>
Cheers,
--
Gianmarco De Francisci Morales



> 2011/7/19 Jacob Perkins <[email protected]>
>
> > I think it's because 'A.score' is a bag but Pig needs a reference to a
> > field in the tuples. This worked for me:
> >
> > A = LOAD 'foo.tsv' AS (name:chararray, no:int, score: int);
> > B = GROUP A BY no;
> > C = FOREACH B {
> >       D = FILTER A BY score > 80;
> >      GENERATE FLATTEN(D.(name, score));
> >    };
> > DUMP C;
> >
> > on the following data:
> >
> > $: cat foo.tsv
> > henrietta       1       25
> > sally   1       82
> > fred    3       120
> > elsie   4       45
> >
> > yields:
> >
> >
> > Does that work for you?
> >
> > --jacob
> > @thedatachef
> >
> > On Tue, 2011-07-19 at 15:00 +0200, 勇胡 wrote:
> > > A = LOAD '/home/test/student.txt' AS (name:chararray, no:int, score:
> > > int);
> > > B = GROUP A BY no;
> > > C =  FOREACH B {
> > >     D = FILTER A BY A.score > 80;
> > >     GENERATE D.name, D.score;}
> > > DUMP C;
> >
> >
>

Reply via email to