I am not sure I follow your query related to PARALLEL.
The value for parallel is a static value.

I was using $MY_PARALLEL as a placeholder to specify what sort of parallelism you need.

Typically you will have a default value in the script

%default MY_PARALLEL '10'

And override it, when required, using command line pig -param MY_PARALLEL=50 ...



Regards,
Mridul

On Tuesday 10 May 2011 04:26 PM, Vincent wrote:
Thanks Mridul for your quick answer!

According to documentation PARALLEL is setting the number of reduce
tasks. So how can I make it taking an UDF instead? Is there any example
of such functions in SVN/pig0.8 package?

Best Regards

Vincent

On Tue, May 10, 2011 at 2:02 PM, Mridul Muralidharan
<[email protected] <mailto:[email protected]>> wrote:


    Easy option would be to write your own udf which can catch corner
    cases, etc  ..
    But assuming your data strictly follows what you mentioned,
    something like this might help (illustrative only !) :

    pets = load 'pets.txt'  USING PigStorage(';') AS (pet_id:chararray,
    pet_type:chararray, pet_name:chararray);

    people = load 'peoples.txt'  USING PigStorage(';') AS
    (user:chararray, ids:chararray);
    people_t = FOREACH people GENERATE user, STRSPLIT(ids, ',');
    -- STRSPLIT returns a tuple, not a bag : so convert to bag and
    flatten it.
    people_reqd = FOREACH people_t GENERATE user, FLATTEN(TOBAG($1)) as
    (user_pet_id);


    reqd_op = JOIN people_reqd BY user_pet_id, pets BY pet_id PARALLEL
    $MY_PARALLEL;


    reqd_op should contain what you need ...



    Regards,
    Mridul





    On Tuesday 10 May 2011 03:00 PM, Vincent wrote:

        Hello dear Pig users,

        *I am loading a file with the following format:*

        *$ cat peoples.txt
        tom;1234,4567,6
        anna;27894*
        First field is a name, second field is a concatenation of an
        unknown number
        of pets ids.

        *I would like to JOIN this file with another one:*

        *$ cat pets.txt
        1234;dog;cocker
        4567;mouse;usa
        6;cat;persian
        27894;cat;manx
        *Fields are pet's id, pet's type, pet's race.
        *
        to get the following result:*

        *1234;dog;cocker;tom
        4567;mouse;usa;tom
        6;cat;persian;tom
        27894;cat;manx;anna*

        *Problem is that I don't know how to convert a tuple of fields
        to lines,
        i.e. to put the file peoples.txt into the following intermediate
        format:*
        *tom,1234
        tom,4567
        tom,6
        anna,27894*

        Thanks in advance for your help!


             Vincent Hervieux




Reply via email to