Actually I wanted to do left outer join, so not sure if filter will work in
this case.


On Thu, Jul 5, 2012 at 12:43 PM, Alan Gates <[email protected]> wrote:

> Pig can only do equi-joins.  Theta joins are hard in MapReduce.  So the
> way to do this is do the equi-join and then filter afterwards.  This will
> not create significant additional cost since the join results will be
> filtered before being materialized to disk.
>
> C = Join table_a on user_id, title_id, table_b on user_id, title_id;
> D = filter C by table_a::timestamp > table_b::timestamp;
>
> Alan.
>
> On Jul 5, 2012, at 12:21 PM, sonia gehlot wrote:
>
> > Hi Guys,
> >
> > I want to join 2 tables in hive on couple of columns and out them one
> > condition is timestamp of one column is greater then the other one. In
> SQL
> > I could have written in this way:
> >
> > table_a a Join table_b b
> > on a.user_id = b.user_id
> > and a.title_id = b.title_id
> > and a.timestamp > b.timestamp
> >
> > How to write last condition in Pig? *a.timestamp > b.timestamp*
> >
> > Thanks,
> > Sonia
>
>

Reply via email to