Pig can only do equi-joins. Theta joins are hard in MapReduce. So the way to do this is do the equi-join and then filter afterwards. This will not create significant additional cost since the join results will be filtered before being materialized to disk.
C = Join table_a on user_id, title_id, table_b on user_id, title_id; D = filter C by table_a::timestamp > table_b::timestamp; Alan. On Jul 5, 2012, at 12:21 PM, sonia gehlot wrote: > Hi Guys, > > I want to join 2 tables in hive on couple of columns and out them one > condition is timestamp of one column is greater then the other one. In SQL > I could have written in this way: > > table_a a Join table_b b > on a.user_id = b.user_id > and a.title_id = b.title_id > and a.timestamp > b.timestamp > > How to write last condition in Pig? *a.timestamp > b.timestamp* > > Thanks, > Sonia
