Marco,

What you want is a combination of COGROUP and FILTER, see:

$: cat foo.tsv 
1       rich
1       happy
2       rich
3       happy
4       rich


----

A = LOAD 'foo.tsv' AS (user_id:int, user_type:chararray);

split A into happy if user_type=='happy', rich if user_type=='rich';

B = COGROUP happy by user_id, rich by user_id;

rich_and_not_happy = foreach (filter B by IsEmpty(happy) and NOT
IsEmpty(rich)) generate group as user_id;

DUMP rich_and_not_happy;

--jacob
@thedatachef

On Tue, 2012-02-28 at 16:49 +0100, Marco Cadetg wrote:
> Hi there,
> 
> I try to retrieve the group of 'rich' userids which are not 'happy' .
> Something like retrieve all ids which are not in the other bags.ids.
> 
> Is there a better way to exclude some rows from a group?
> 
> 
> Example code:
> 
> A: {userid: chararray,user_type: chararray}
> 
> A:
> (1,rich)
> (1,happy)
> (2,rich)
> (3,happy)
> (4,rich)
> 
> RICH = FILTER A BY user_type == 'rich';
> HAPPY = FILTER A BY user_type == 'happy';
> 
> dump RICH
> (1,rich)
> (2,rich)
> (4,rich)
> 
> BOTH = JOIN RICH BY $0, HAPPY BY $0;
> BOTH = FOREACH (GROUP BOTH ALL) {GENERATE COUNT(BOTH) AS counter;}
> 
> RICH_AND_NOT_HAPPY = FOREACH (GROUP RICH ALL) {GENERATE
> COUNT(RICH)-BOTH.counter AS total;}
> dump RICH_AND_NOT_HAPPY
> (2)
> 
> Thanks for you help!
> -Marco


Reply via email to