Since I am a Pig developer, I will say "do everything Pig" :).

To be frankly, if these 9 functions are all you want, you can easily convert them into Pig, but you will not get too much if non of 9 functions can utilize existing UDFs. Here is one way you can do it:

* Write a UDF LineProcess:
public class LineProcess extends EvalFunc<DataBag> {
   @Override
   public DataBag exec(Tuple in) {
       String line = (String)in.get(0);
       //initialize all the operators if they are not initialized
       if( !op1.isInitialized() )
           op1.initialize();
if( !op2.isInitialized() )
           op2.initialize();
//and so on with all operators //process each operator
       op1.process(line);
       String[] resultOP1 = op1.getResults();
op2.process(resultOP1);
       String[][] resultOP2 = op2.getResults();
       //and so on with all the operators
DataBag db = new DefaultDataBag(); for (int i=0;i<resultOP9.length;i++) {
           TupleFactory.getInstance().newTuple();
           t.append(resultOP9[i]);
           db.add(t);
       }
       return db;
   }
   @Override
   public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), DataType.BAG));
   }
}

* Drive it using a Pig script:
a = load '1.txt' as (a0:chararray);
b = foreach a generate flatten(LineProcess(a0));
store b into 'out';

If going forward, you want to use Filter/Join, and other native Pig functionality, or if you want to break these 9 functions and combine them in a different way, Pig will definitely help.

Daniel

Cornelio Iñigo wrote:
Hi

I'm starting with this of hadoop and Pig, I have to pass a hadoop MapReduce
program that i made to Pig, in the hadoop program I have just a Map function
and on it I perform all the process
that consists to analize some text... to this 9 functions (operators) are
called, this functions run in a secuencial mode (when the first is done, the
second is started and so on), here is how map looks:


        static class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{


                 //declaration of operators or functions
                 Operator1 op1 = new Operator1();
                 Operator2 op2 = new Operator2();
                 Operator3 op3 = new Operator3();
                 ...
                 ...
        /*map function

        */

        public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{

                                //get a row from csv
                                 String line = value.toString();

                               //some code to parse the line
                               ...
                               ...

                             //initialize all the operators if they are not
initialized
                               if( !op1.isInitialized() )
                                        op1.initialize();

                                if( !op2.isInitialized() )
                                        op2.initialize();

                                 ...
                                 ...//and so on with all operators


                                //process each operator
                                op1.process(line);
                                String[] resultOP1 = op1.getResults();

                                op2.process(resultOP1);
                                String[][] resultOP2 = op2.getResults();
                                ...//and so on with all the operators
                                ...

                              //finally collect results
                               String put = "";
                                for( int k = 0 ; k < resultOP9.length ; k++
){
                                   for( int j = 0; j < resultOP9[k].length;
j++ ){

                                        context.write...
                                    }
                                }
                            }
        }
    }



 My question is if its a good idea or if there is a way to pass this type of
program to Pig?

Thanks


Reply via email to