Replace the filter with a foreach / generate (replace the right side with nulls of > condition is not satisfied)
On Thu, Jul 5, 2012 at 2:28 PM, sonia gehlot <[email protected]> wrote: > Actually I wanted to do left outer join, so not sure if filter will work in > this case. > > > On Thu, Jul 5, 2012 at 12:43 PM, Alan Gates <[email protected]> wrote: > >> Pig can only do equi-joins. Theta joins are hard in MapReduce. So the >> way to do this is do the equi-join and then filter afterwards. This will >> not create significant additional cost since the join results will be >> filtered before being materialized to disk. >> >> C = Join table_a on user_id, title_id, table_b on user_id, title_id; >> D = filter C by table_a::timestamp > table_b::timestamp; >> >> Alan. >> >> On Jul 5, 2012, at 12:21 PM, sonia gehlot wrote: >> >> > Hi Guys, >> > >> > I want to join 2 tables in hive on couple of columns and out them one >> > condition is timestamp of one column is greater then the other one. In >> SQL >> > I could have written in this way: >> > >> > table_a a Join table_b b >> > on a.user_id = b.user_id >> > and a.title_id = b.title_id >> > and a.timestamp > b.timestamp >> > >> > How to write last condition in Pig? *a.timestamp > b.timestamp* >> > >> > Thanks, >> > Sonia >> >>
