Rakesh, Just like in SQL, this is achieved by doing an outer join and filtering for nulls (a null join key indicates absence of a matching row).
D 2012/3/18 rakesh sharma <[email protected]>: > > Thanks to Dan for suggesting to post it on gist. Here is the link to the post: > https://raw.github.com/gist/2079527/bf68dd2f0a7ee3864ef066f126c34880b20b6b04/SelectiveDataRemoval > Please take a look and I am sure many of you have solution to this problem. > Thanks,Rakesh >> Date: Sun, 18 Mar 2012 12:35:33 -0600 >> Subject: RE: Selective removal of data from a relation >> From: [email protected] >> To: [email protected] >> >> Post it on https://gist.github.com/ and email out the gist. >> >> Regards, >> >> Dan >> On Mar 18, 2012 12:33 PM, "rakesh sharma" <[email protected]> >> wrote: >> >> > >> > All indentations get removed when message comes back from >> > [email protected]. Any idea how I can make it work. >> > >> > > From: [email protected] >> > > To: [email protected] >> > > Subject: RE: Selective removal of data from a relation >> > > Date: Sun, 18 Mar 2012 18:26:01 +0000 >> > > >> > > >> > > I am sorry for so many re-sends. Resending in Rich text format... >> > > Hi All, >> > > I have two relations "mix" and "child_parent". Relation "mix" contains >> > rows of ids. Each Id can be a parent or a child. Another relation >> > "child-parent" has rows of children and associated parents. It may not have >> > data for every child existing in relation "mix". Also, it can have some >> > data for which there is no matching data in relation "mix". I need to >> > remove all children from relation "mix" whose parent exists in the >> > relation. Here is an example to show what I am trying to achieve:mix = load >> > "all_data" as (id:chararray);dump mix; >> > > 13469 >> > > child_parent = load "mapping" as (childId:chararray, >> > parentId:chararray);dump child_parent; >> > > (3 1)(6 1)(9 15) >> > > Children "3" and "6" has matching parent "1". Hence, 3 and 6 need to be >> > removed from "all_data". However, child "9" will stay as its parent "15" >> > does not exist in "all_data". The outcome will be:149I am having hard time >> > in solving it due to lack of experience with pig. Any help/suggestion will >> > be highly appreciated. >> > > Thanks,Rakesh >> > >
