I have script as follows:
register lookup.jar;
a = load 'lookupfile.dat' as(emp_id: chararray);
b = foreach a generate flatten(com.mycompany.pig.lookup());
The file lookupfile.dat has the following data:
10603103
10603115
10603106
The jar file lookup.jar has my UDF:
public class lookup extends EvalFunc<DataBag>
{
...
...
public DataBag exec(Tuple input)
throws IOException
{
...
...
}
}
My UDF works as expected in versions 0.5.0, 0.6.0 and 0.7.0. In version
0.8.0, I notice that the input tuple "input" has 1 field with value of
type DataByteArray, whereas in earlier versions the value is of type
String (as expected). Why is this different? I am assuming this is an
intentional change in 0.8.0. Is there some way to force conversion from
the raw data before the UDF is invoked, i.e., the old behaviour? What is
the recommended approach in 0.8.0 for EvalFunc UDFs?
Sanjay Kaluskar
Ph: +91-80-4020-3083
Sr. Architect
VOIP: +1-650-385-6659
Platform
Mobile: +91-96322-24246