Assume A: {id, foo} and B: {id, bar}
To get all rows that have ids in both A and B:
C = join A by id, B by id;
To get all rows that have ids in A but not in B:
C = filter (join A by id left outer, B by id) by B::id is null;
To get all rows that have ids in B but not in A:
C = filter (join A by id right outer, B by id) by A::id is null;
To get all rows that don't have a matching row in another relation:
C = filter (join A by id outer, B by id) by A::id is null OR B::id is null;
2012/3/18 rakesh sharma <[email protected]>:
> Dmitriy,
> I tried it. However, I don't seem to be getting a handle on it. Some pseudo
> code will be highly appreciated.
>
> Thanks,
> Rakesh
>
>> Date: Sun, 18 Mar 2012 14:27:25 -0700
>> Subject: Re: Selective removal of data from a relation
>> From: [email protected]
>> To: [email protected]
>>
>> Rakesh,
>> Just like in SQL, this is achieved by doing an outer join and
>> filtering for nulls (a null join key indicates absence of a matching
>> row).
>>
>> D
>>
>> 2012/3/18 rakesh sharma <[email protected]>:
>> >
>> > Thanks to Dan for suggesting to post it on gist. Here is the link to the
>> > post:
>> > https://raw.github.com/gist/2079527/bf68dd2f0a7ee3864ef066f126c34880b20b6b04/SelectiveDataRemoval
>> > Please take a look and I am sure many of you have solution to this problem.
>> > Thanks,Rakesh
>> >> Date: Sun, 18 Mar 2012 12:35:33 -0600
>> >> Subject: RE: Selective removal of data from a relation
>> >> From: [email protected]
>> >> To: [email protected]
>> >>
>> >> Post it on https://gist.github.com/ and email out the gist.
>> >>
>> >> Regards,
>> >>
>> >> Dan
>> >> On Mar 18, 2012 12:33 PM, "rakesh sharma" <[email protected]>
>> >> wrote:
>> >>
>> >> >
>> >> > All indentations get removed when message comes back from
>> >> > [email protected]. Any idea how I can make it work.
>> >> >
>> >> > > From: [email protected]
>> >> > > To: [email protected]
>> >> > > Subject: RE: Selective removal of data from a relation
>> >> > > Date: Sun, 18 Mar 2012 18:26:01 +0000
>> >> > >
>> >> > >
>> >> > > I am sorry for so many re-sends. Resending in Rich text format...
>> >> > > Hi All,
>> >> > > I have two relations "mix" and "child_parent". Relation "mix" contains
>> >> > rows of ids. Each Id can be a parent or a child. Another relation
>> >> > "child-parent" has rows of children and associated parents. It may not
>> >> > have
>> >> > data for every child existing in relation "mix". Also, it can have some
>> >> > data for which there is no matching data in relation "mix". I need to
>> >> > remove all children from relation "mix" whose parent exists in the
>> >> > relation. Here is an example to show what I am trying to achieve:mix =
>> >> > load
>> >> > "all_data" as (id:chararray);dump mix;
>> >> > > 13469
>> >> > > child_parent = load "mapping" as (childId:chararray,
>> >> > parentId:chararray);dump child_parent;
>> >> > > (3 1)(6 1)(9 15)
>> >> > > Children "3" and "6" has matching parent "1". Hence, 3 and 6 need to
>> >> > > be
>> >> > removed from "all_data". However, child "9" will stay as its parent "15"
>> >> > does not exist in "all_data". The outcome will be:149I am having hard
>> >> > time
>> >> > in solving it due to lack of experience with pig. Any help/suggestion
>> >> > will
>> >> > be highly appreciated.
>> >> > > Thanks,Rakesh
>> >> >
>> >
>