No Michael essentially wants a cross-product. It's a terrible thing and should be avoided :).
T1: a 1 b 2 c 3 d 4 T2: a x b y c z joined this way on the first column becomes: a 1 b y a 1 c z b 2 a x b 2 c z c 3 a x c 3 b y d 4 a x d 4 b y d 4 c z Note the cardinality explosion. Now assume that you are doing this in Pig / Hadoop because one of the relations is TB-sized, or at least multi-gigabyte. And this is why Pig doesn't support it. But if you really want to, join on a constant (so all rows in T1 will match all rows in T2) and filter out those for which T1.loc == T2.loc And don't say I didn't warn you :). D On Tue, Jan 3, 2012 at 5:34 AM, Jacob Perkins <[email protected]>wrote: > If I understand correctly, this is nothing more than an anti-join which > can be done with pig using a cogroup. > > So your SQL below: > > > select * from yee a left join yer b on a.loc != b.loc; > > becomes something like: > > a = load 'yee' as (loc:chararray, stuff:int); > b = load 'yer' as (loc:chararray, stuff:int); > > c = cogroup a by loc, b by loc; > d = foreach (filter c by IsEmpty(b)) generate FLATTEN(a); > > which will result in d containing only the records from a where the > 'loc' field doesn't match with the 'loc' field in b. > > --jacob > @thedatachef > >
