Hi Jacob, Thanks a lot! -Marco
On Tue, Feb 28, 2012 at 4:59 PM, Jacob Perkins <[email protected]>wrote: > Marco, > > What you want is a combination of COGROUP and FILTER, see: > > $: cat foo.tsv > 1 rich > 1 happy > 2 rich > 3 happy > 4 rich > > > ---- > > A = LOAD 'foo.tsv' AS (user_id:int, user_type:chararray); > > split A into happy if user_type=='happy', rich if user_type=='rich'; > > B = COGROUP happy by user_id, rich by user_id; > > rich_and_not_happy = foreach (filter B by IsEmpty(happy) and NOT > IsEmpty(rich)) generate group as user_id; > > DUMP rich_and_not_happy; > > --jacob > @thedatachef > > On Tue, 2012-02-28 at 16:49 +0100, Marco Cadetg wrote: > > Hi there, > > > > I try to retrieve the group of 'rich' userids which are not 'happy' . > > Something like retrieve all ids which are not in the other bags.ids. > > > > Is there a better way to exclude some rows from a group? > > > > > > Example code: > > > > A: {userid: chararray,user_type: chararray} > > > > A: > > (1,rich) > > (1,happy) > > (2,rich) > > (3,happy) > > (4,rich) > > > > RICH = FILTER A BY user_type == 'rich'; > > HAPPY = FILTER A BY user_type == 'happy'; > > > > dump RICH > > (1,rich) > > (2,rich) > > (4,rich) > > > > BOTH = JOIN RICH BY $0, HAPPY BY $0; > > BOTH = FOREACH (GROUP BOTH ALL) {GENERATE COUNT(BOTH) AS counter;} > > > > RICH_AND_NOT_HAPPY = FOREACH (GROUP RICH ALL) {GENERATE > > COUNT(RICH)-BOTH.counter AS total;} > > dump RICH_AND_NOT_HAPPY > > (2) > > > > Thanks for you help! > > -Marco > > >
