I am not sure about what you meant by "null match".

Would this work  ?

F1 = load 'largefile' as (field1,..);
F2 = load 'smallfile' as (field2, ..);

-- as the file is very small , use replicated join.
J = join F1 by field1 LEFT, F2 by field1 using 'replicated';
FE = foreach J generate F1.field1,
    F2.field1 is null ? F1.field1 : F2.field1,
    F2.field1 is null ? F1.field1 : F2.field1
    ;





On 8/2/10 7:13 AM, "Kochis, Allan" <allan.koc...@schwab.com> wrote:

Hi,


Have a pig question.
I have two HDFS file, a smaller file
that has
|field1|field2|field3|


and a larger file that has

|..|.. |...|field2|....|field3|.....|field1|...| ..|

I would like to replace field2 and field3 in my larger file when they
are null match on field1.

I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.

Can this be done in a pig join?


Thanks,

Allan


Reply via email to