I am not sure I follow your query related to PARALLEL.
The value for parallel is a static value.
I was using $MY_PARALLEL as a placeholder to specify what sort of
parallelism you need.
Typically you will have a default value in the script
%default MY_PARALLEL '10'
And override it, when required, using command line pig -param
MY_PARALLEL=50 ...
Regards,
Mridul
On Tuesday 10 May 2011 04:26 PM, Vincent wrote:
Thanks Mridul for your quick answer!
According to documentation PARALLEL is setting the number of reduce
tasks. So how can I make it taking an UDF instead? Is there any example
of such functions in SVN/pig0.8 package?
Best Regards
Vincent
On Tue, May 10, 2011 at 2:02 PM, Mridul Muralidharan
<[email protected] <mailto:[email protected]>> wrote:
Easy option would be to write your own udf which can catch corner
cases, etc ..
But assuming your data strictly follows what you mentioned,
something like this might help (illustrative only !) :
pets = load 'pets.txt' USING PigStorage(';') AS (pet_id:chararray,
pet_type:chararray, pet_name:chararray);
people = load 'peoples.txt' USING PigStorage(';') AS
(user:chararray, ids:chararray);
people_t = FOREACH people GENERATE user, STRSPLIT(ids, ',');
-- STRSPLIT returns a tuple, not a bag : so convert to bag and
flatten it.
people_reqd = FOREACH people_t GENERATE user, FLATTEN(TOBAG($1)) as
(user_pet_id);
reqd_op = JOIN people_reqd BY user_pet_id, pets BY pet_id PARALLEL
$MY_PARALLEL;
reqd_op should contain what you need ...
Regards,
Mridul
On Tuesday 10 May 2011 03:00 PM, Vincent wrote:
Hello dear Pig users,
*I am loading a file with the following format:*
*$ cat peoples.txt
tom;1234,4567,6
anna;27894*
First field is a name, second field is a concatenation of an
unknown number
of pets ids.
*I would like to JOIN this file with another one:*
*$ cat pets.txt
1234;dog;cocker
4567;mouse;usa
6;cat;persian
27894;cat;manx
*Fields are pet's id, pet's type, pet's race.
*
to get the following result:*
*1234;dog;cocker;tom
4567;mouse;usa;tom
6;cat;persian;tom
27894;cat;manx;anna*
*Problem is that I don't know how to convert a tuple of fields
to lines,
i.e. to put the file peoples.txt into the following intermediate
format:*
*tom,1234
tom,4567
tom,6
anna,27894*
Thanks in advance for your help!
Vincent Hervieux