I have script as follows:

 

register lookup.jar;

a = load 'lookupfile.dat' as(emp_id: chararray);

b = foreach a generate flatten(com.mycompany.pig.lookup());

 

The file lookupfile.dat has the following data:

10603103

10603115

10603106

 

The jar file lookup.jar has my UDF:

 

public class lookup extends EvalFunc<DataBag>

{

...

...

  public DataBag exec(Tuple input)

      throws IOException

  {

    ...

    ...

  }

}

 

My UDF works as expected in versions 0.5.0, 0.6.0 and 0.7.0. In version
0.8.0, I notice that the input tuple "input" has 1 field with value of
type DataByteArray, whereas in earlier versions the value is of type
String (as expected). Why is this different? I am assuming this is an
intentional change in 0.8.0. Is there some way to force conversion from
the raw data before the UDF is invoked, i.e., the old behaviour? What is
the recommended approach in 0.8.0 for EvalFunc UDFs?

 

 

Sanjay Kaluskar
Ph: +91-80-4020-3083

Sr. Architect
VOIP: +1-650-385-6659

Platform
Mobile: +91-96322-24246

 

Reply via email to