Thanks Mridul for your quick answer! According to documentation PARALLEL is setting the number of reduce tasks. So how can I make it taking an UDF instead? Is there any example of such functions in SVN/pig0.8 package?
Best Regards Vincent On Tue, May 10, 2011 at 2:02 PM, Mridul Muralidharan <[email protected]>wrote: > > Easy option would be to write your own udf which can catch corner cases, > etc .. > But assuming your data strictly follows what you mentioned, something like > this might help (illustrative only !) : > > pets = load 'pets.txt' USING PigStorage(';') AS (pet_id:chararray, > pet_type:chararray, pet_name:chararray); > > people = load 'peoples.txt' USING PigStorage(';') AS (user:chararray, > ids:chararray); > people_t = FOREACH people GENERATE user, STRSPLIT(ids, ','); > -- STRSPLIT returns a tuple, not a bag : so convert to bag and flatten it. > people_reqd = FOREACH people_t GENERATE user, FLATTEN(TOBAG($1)) as > (user_pet_id); > > > reqd_op = JOIN people_reqd BY user_pet_id, pets BY pet_id PARALLEL > $MY_PARALLEL; > > > reqd_op should contain what you need ... > > > > Regards, > Mridul > > > > > > On Tuesday 10 May 2011 03:00 PM, Vincent wrote: > >> Hello dear Pig users, >> >> *I am loading a file with the following format:* >> >> *$ cat peoples.txt >> tom;1234,4567,6 >> anna;27894* >> First field is a name, second field is a concatenation of an unknown >> number >> of pets ids. >> >> *I would like to JOIN this file with another one:* >> >> *$ cat pets.txt >> 1234;dog;cocker >> 4567;mouse;usa >> 6;cat;persian >> 27894;cat;manx >> *Fields are pet's id, pet's type, pet's race. >> * >> to get the following result:* >> >> *1234;dog;cocker;tom >> 4567;mouse;usa;tom >> 6;cat;persian;tom >> 27894;cat;manx;anna* >> >> *Problem is that I don't know how to convert a tuple of fields to lines, >> i.e. to put the file peoples.txt into the following intermediate format:* >> *tom,1234 >> tom,4567 >> tom,6 >> anna,27894* >> >> Thanks in advance for your help! >> >> >> Vincent Hervieux >> > >
