How can I understand that 'A.score' is a bag? I mean that if I issue a
'describe B' command, I can get B: {group:int, A: {name:chararray,
no:int,score:int}}. From here, I can't get any information that 'A.score' is
a bag, but I can see that A.score is an element of bag.
And why if I delete the quantifier 'A.', it works?
I just changed my pig code as
A = LOAD '/home/huyong/test/student.txt' AS (name:chararray, no:int, score:
int);
B = GROUP A BY no;
C = FOREACH B {
D = FILTER A BY score > 80;
GENERATE D.name, D.score;}
DUMP C;
I got an empty bag!
The input is as:
henrietta 1 25
sally 1 82
fred 3 120
elsie 4 45
The output is as:
({(sally)},{(82)})
({(fred)},{(120)})
({},{})
As you see, I got an empty tuple? why?
Yong
2011/7/19 Jacob Perkins <[email protected]>
> I think it's because 'A.score' is a bag but Pig needs a reference to a
> field in the tuples. This worked for me:
>
> A = LOAD 'foo.tsv' AS (name:chararray, no:int, score: int);
> B = GROUP A BY no;
> C = FOREACH B {
> D = FILTER A BY score > 80;
> GENERATE FLATTEN(D.(name, score));
> };
> DUMP C;
>
> on the following data:
>
> $: cat foo.tsv
> henrietta 1 25
> sally 1 82
> fred 3 120
> elsie 4 45
>
> yields:
>
>
> Does that work for you?
>
> --jacob
> @thedatachef
>
> On Tue, 2011-07-19 at 15:00 +0200, 勇胡 wrote:
> > A = LOAD '/home/test/student.txt' AS (name:chararray, no:int, score:
> > int);
> > B = GROUP A BY no;
> > C = FOREACH B {
> > D = FILTER A BY A.score > 80;
> > GENERATE D.name, D.score;}
> > DUMP C;
>
>