Isn't it the same problem mentioned in

http://mail-archives.apache.org/mod_mbox/pig-user/201101.mbox/%3c90570b63-0991-4127-8e3e-69e2b6e4b...@few.vu.nl%3E

if so, there is already a fix.

Shawn

On Wed, Jan 26, 2011 at 7:36 AM, Jonathan Coveney <jcove...@gmail.com> wrote:
> So, bizarrely, I am either not understanding how pig does joins or there is
> a bug... it has been quite frustrating to troubleshoot.
>
> The issue is this: after doing a join to get set5, I do a foreach generate
> to make set6. Depending on the order in the join statement, one value gets
> erased by another. Here is the specific part I am talking about:
>
> set1 = JOIN Z2 by demo,small_table by demo;
> set2 = foreach set1 generate Z2::uid as uid,Z2::c2 as c2,Z2::ss2k as
> ss2k,Z2::time_id as time_id ,Z2::countryCode as countryCode,Z2::segment as
> segment,small_table::value as alsodemo;
> set3 = filter set2 BY segment == 1;
> set4 = filter set2 BY segment == 2;
> set4_a = foreach set4 generate uid, c2, ss2k, time_id, countryCode, alsodemo
> as gender;
> set5 = join set4_a by (uid,c2,ss2k,time_id,countryCode) full, set3 by
> (uid,c2,ss2k,time_id,countryCode);
> set6 = foreach set5 generate ((set3::uid IS NULL) ? set4_a::uid : set3::uid)
> as uid,
>       ((set3::c2 IS NULL) ? set4_a::c2 : set3::c2) as c2,
>       ((set3::ss2k IS NULL) ? set4_a::ss2k : set3::ss2k) as ss2k,
>       ((set3::time_id IS NULL) ? set4_a::time_id : set3::time_id) as
> time_id,
>       ((set3::countryCode IS NULL) ? set4_a::countryCode :
> set3::countryCode) as countryCode,
>       gender, alsodemo as min_age;
>
> If set5 joins set4_a on the left and set3 on the right, then while set5 will
> output properly, set6 will not. the "gender" column will erase the alsodemo
> column.
>
> If set5 joins set3 on the left and set4_a on the right, then set5 still
> works, but set6 will not: in this case, alsodemo will be fine, but it will
> erase gender. Basically, the last two columns are always the same. Making
> the references set4_a::gender and set3::alsodemo don't change anything (in
> fact, originally that's how this was done, but in troubleshooting we have
> been changing things to try and fix it).
>
> This is a super bizarre example of the script behaving in a pretty awful
> fashion. Not sure what is causing it, would love to know if anyone has any
> ideas, and if this is a bug?
>
> I can post the full script, but a lot of it isn't really germane to this.
>

Reply via email to