Isn't it the same problem mentioned in http://mail-archives.apache.org/mod_mbox/pig-user/201101.mbox/%3c90570b63-0991-4127-8e3e-69e2b6e4b...@few.vu.nl%3E
if so, there is already a fix. Shawn On Wed, Jan 26, 2011 at 7:36 AM, Jonathan Coveney <jcove...@gmail.com> wrote: > So, bizarrely, I am either not understanding how pig does joins or there is > a bug... it has been quite frustrating to troubleshoot. > > The issue is this: after doing a join to get set5, I do a foreach generate > to make set6. Depending on the order in the join statement, one value gets > erased by another. Here is the specific part I am talking about: > > set1 = JOIN Z2 by demo,small_table by demo; > set2 = foreach set1 generate Z2::uid as uid,Z2::c2 as c2,Z2::ss2k as > ss2k,Z2::time_id as time_id ,Z2::countryCode as countryCode,Z2::segment as > segment,small_table::value as alsodemo; > set3 = filter set2 BY segment == 1; > set4 = filter set2 BY segment == 2; > set4_a = foreach set4 generate uid, c2, ss2k, time_id, countryCode, alsodemo > as gender; > set5 = join set4_a by (uid,c2,ss2k,time_id,countryCode) full, set3 by > (uid,c2,ss2k,time_id,countryCode); > set6 = foreach set5 generate ((set3::uid IS NULL) ? set4_a::uid : set3::uid) > as uid, > ((set3::c2 IS NULL) ? set4_a::c2 : set3::c2) as c2, > ((set3::ss2k IS NULL) ? set4_a::ss2k : set3::ss2k) as ss2k, > ((set3::time_id IS NULL) ? set4_a::time_id : set3::time_id) as > time_id, > ((set3::countryCode IS NULL) ? set4_a::countryCode : > set3::countryCode) as countryCode, > gender, alsodemo as min_age; > > If set5 joins set4_a on the left and set3 on the right, then while set5 will > output properly, set6 will not. the "gender" column will erase the alsodemo > column. > > If set5 joins set3 on the left and set4_a on the right, then set5 still > works, but set6 will not: in this case, alsodemo will be fine, but it will > erase gender. Basically, the last two columns are always the same. Making > the references set4_a::gender and set3::alsodemo don't change anything (in > fact, originally that's how this was done, but in troubleshooting we have > been changing things to try and fix it). > > This is a super bizarre example of the script behaving in a pretty awful > fashion. Not sure what is causing it, would love to know if anyone has any > ideas, and if this is a bug? > > I can post the full script, but a lot of it isn't really germane to this. >