Hi Jacob,

Thanks a lot!
-Marco

On Tue, Feb 28, 2012 at 4:59 PM, Jacob Perkins <[email protected]>wrote:

> Marco,
>
> What you want is a combination of COGROUP and FILTER, see:
>
> $: cat foo.tsv
> 1       rich
> 1       happy
> 2       rich
> 3       happy
> 4       rich
>
>
> ----
>
> A = LOAD 'foo.tsv' AS (user_id:int, user_type:chararray);
>
> split A into happy if user_type=='happy', rich if user_type=='rich';
>
> B = COGROUP happy by user_id, rich by user_id;
>
> rich_and_not_happy = foreach (filter B by IsEmpty(happy) and NOT
> IsEmpty(rich)) generate group as user_id;
>
> DUMP rich_and_not_happy;
>
> --jacob
> @thedatachef
>
> On Tue, 2012-02-28 at 16:49 +0100, Marco Cadetg wrote:
> > Hi there,
> >
> > I try to retrieve the group of 'rich' userids which are not 'happy' .
> > Something like retrieve all ids which are not in the other bags.ids.
> >
> > Is there a better way to exclude some rows from a group?
> >
> >
> > Example code:
> >
> > A: {userid: chararray,user_type: chararray}
> >
> > A:
> > (1,rich)
> > (1,happy)
> > (2,rich)
> > (3,happy)
> > (4,rich)
> >
> > RICH = FILTER A BY user_type == 'rich';
> > HAPPY = FILTER A BY user_type == 'happy';
> >
> > dump RICH
> > (1,rich)
> > (2,rich)
> > (4,rich)
> >
> > BOTH = JOIN RICH BY $0, HAPPY BY $0;
> > BOTH = FOREACH (GROUP BOTH ALL) {GENERATE COUNT(BOTH) AS counter;}
> >
> > RICH_AND_NOT_HAPPY = FOREACH (GROUP RICH ALL) {GENERATE
> > COUNT(RICH)-BOTH.counter AS total;}
> > dump RICH_AND_NOT_HAPPY
> > (2)
> >
> > Thanks for you help!
> > -Marco
>
>
>

Reply via email to