Hi All,

I need to generate a unique key for each grouped tuple and then store it along with each tuple. For this I have created a UDF which generates a key (current time in milliseconds appended with a static incrementing sequence number)
I used it in the script as below -

/1. a = load '1.txt' using PigStorage(',') as (id: chararray, name: chararray, age: int); 2. b = load '2.txt' using PigStorage(',') as (id: chararray, name: chararray, desg: chararray);
3.  c = cogroup a by (ide, name), b by (id, name);
4.  d = filter c by not IsEmpty(a) and not IsEmpty(b);
5.  e = foreach d generate myudf.KeyGenerator(*), *;
6.  dump e;
7.  f = foreach e generate $0, flatten(a);
8.  dump f;
9.  g = foreach e generate $0, flatten(b);
10.dump g;/

At step 6, I could see the unique key generated and printed.
But when it comes to step 8 & 10, the unique key printed is different to what is generated at step 6 even though I'm carrying the same key to these steps in the script.

What is going wrong? How can I achieve this requirement?

Regards,
Sarath.

Reply via email to