Hi Dmitriy,
Thanks for sending it out. My problem is slightly different. However, you
provide me some ideas and I am going to try them and provide an update.
Thanks,Rakesh
> Date: Mon, 19 Mar 2012 09:51:09 -0700
> Subject: Re: Selective removal of data from a relation
> From: [email protected]
> To: [email protected]
>
> Assume A: {id, foo} and B: {id, bar}
>
> To get all rows that have ids in both A and B:
>
> C = join A by id, B by id;
>
> To get all rows that have ids in A but not in B:
>
> C = filter (join A by id left outer, B by id) by B::id is null;
>
> To get all rows that have ids in B but not in A:
> C = filter (join A by id right outer, B by id) by A::id is null;
>
> To get all rows that don't have a matching row in another relation:
> C = filter (join A by id outer, B by id) by A::id is null OR B::id is null;
>
> 2012/3/18 rakesh sharma <[email protected]>:
> > Dmitriy,
> > I tried it. However, I don't seem to be getting a handle on it. Some pseudo
> > code will be highly appreciated.
> >
> > Thanks,
> > Rakesh
> >
> >> Date: Sun, 18 Mar 2012 14:27:25 -0700
> >> Subject: Re: Selective removal of data from a relation
> >> From: [email protected]
> >> To: [email protected]
> >>
> >> Rakesh,
> >> Just like in SQL, this is achieved by doing an outer join and
> >> filtering for nulls (a null join key indicates absence of a matching
> >> row).
> >>
> >> D
> >>
> >> 2012/3/18 rakesh sharma <[email protected]>:
> >> >
> >> > Thanks to Dan for suggesting to post it on gist. Here is the link to the
> >> > post:
> >> > https://raw.github.com/gist/2079527/bf68dd2f0a7ee3864ef066f126c34880b20b6b04/SelectiveDataRemoval‏
> >> > Please take a look and I am sure many of you have solution to this
> >> > problem.
> >> > Thanks,Rakesh
> >> >> Date: Sun, 18 Mar 2012 12:35:33 -0600
> >> >> Subject: RE: Selective removal of data from a relation
> >> >> From: [email protected]
> >> >> To: [email protected]
> >> >>
> >> >> Post it on https://gist.github.com/ and email out the gist.
> >> >>
> >> >> Regards,
> >> >>
> >> >> Dan
> >> >> On Mar 18, 2012 12:33 PM, "rakesh sharma" <[email protected]>
> >> >> wrote:
> >> >>
> >> >> >
> >> >> > All indentations get removed when message comes back from
> >> >> > [email protected]. Any idea how I can make it work.
> >> >> >
> >> >> > > From: [email protected]
> >> >> > > To: [email protected]
> >> >> > > Subject: RE: Selective removal of data from a relation
> >> >> > > Date: Sun, 18 Mar 2012 18:26:01 +0000
> >> >> > >
> >> >> > >
> >> >> > > I am sorry for so many re-sends. Resending in Rich text format...
> >> >> > > Hi All,
> >> >> > > I have two relations "mix" and "child_parent". Relation "mix"
> >> >> > > contains
> >> >> > rows of ids. Each Id can be a parent or a child. Another relation
> >> >> > "child-parent" has rows of children and associated parents. It may
> >> >> > not have
> >> >> > data for every child existing in relation "mix". Also, it can have
> >> >> > some
> >> >> > data for which there is no matching data in relation "mix". I need to
> >> >> > remove all children from relation "mix" whose parent exists in the
> >> >> > relation. Here is an example to show what I am trying to achieve:mix
> >> >> > = load
> >> >> > "all_data" as (id:chararray);dump mix;
> >> >> > > 13469
> >> >> > > child_parent = load "mapping" as (childId:chararray,
> >> >> > parentId:chararray);dump child_parent;
> >> >> > > (3 1)(6 1)(9 15)
> >> >> > > Children "3" and "6" has matching parent "1". Hence, 3 and 6 need
> >> >> > > to be
> >> >> > removed from "all_data". However, child "9" will stay as its parent
> >> >> > "15"
> >> >> > does not exist in "all_data". The outcome will be:149I am having hard
> >> >> > time
> >> >> > in solving it due to lack of experience with pig. Any help/suggestion
> >> >> > will
> >> >> > be highly appreciated.
> >> >> > > Thanks,Rakesh
> >> >> >
> >> >
> >