Hi Kevin, The inputs parameters to the udf are wrapped inside a tuple and then given as input to the execu function in the udf. In case of - >> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa); The exec function gets a Tuple with one column which is a tuple(long,long,long) ie in exec(Tuple input), input.get(0) will return tuple(long,long,long) .
On the other hand if you called the udf this way - >> grunt> C = FOREACH A GENERATE UDF.SumAll((long)a1,(chararray)a2); in exec(Tuple input), input.get(0) will return long, input.get(1) will return chararray. I hope this answers you question. Thanks, Thejas On 11/5/09 9:15 PM, "Kelvin Moss" <[email protected]> wrote: > > Thanks for the reply. I understand that Tuple can have more than one field. > That is why I was expecting Tuple.getAll to return me all the fields in the > Tuple. But as it turns out it returns a Tuple. That made me think that may be > Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not > valid, right? > > ((1,2,3),(4,5,6)) > > It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am > confusing things? > > Thanks! > > --- On Thu, 11/5/09, Jeff Zhang <[email protected]> wrote: > > > From: Jeff Zhang <[email protected]> > Subject: Re: Accessing fields in Tuple > To: [email protected] > Date: Thursday, November 5, 2009, 7:44 PM > > > The input is the arguments you provide to your UDF. It is tuple type. Tuple > can have more than more than one element. That means your UDF can have more > than one argument. Here you provide one argument which is tuple type to > your UDF. > So that means the first element of input is a tuple. > > > Jeff Zhang > > > On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <[email protected]> wrote: > >> Hi all, >> >> I have the follwoing data file >> >> (1L,2L,3L) >> (4L,2L,1L) >> (8L,3L,4L) >> >> I am trying to write a UDF (like sum) that would add the fields in Tuple. >> This works -- >> >> public class SumAll extends EvalFunc<Long> { >> public Long exec(Tuple input) { >> try { >> return sum(input); >> } catch (NumberFormatException e) { >> // TODO Auto-generated catch block >> e.printStackTrace(); >> } catch (ExecException e) { >> // TODO Auto-generated catch block >> e.printStackTrace(); >> } >> return 0L; >> } >> >> static protected Long sum(Tuple input) throws ExecException, >> NumberFormatException { >> long sum = 0; >> >> List<Object> values = input.getAll(); >> for (Iterator<Object> it = values.iterator(); it.hasNext();) { >> Tuple t = (Tuple)it.next(); >> sum += (Long)t.get(0); >> sum += (Long)t.get(1); >> sum += (Long)t.get(2); >> } >> return sum; >> } >> >> } >> >> grunt> A = LOAD 'data2' as aa:bytearray; >> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa); >> grunt> dump C; >> 2009-11-05 10:07:09,266 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully >> stored result in: "file:/tmp/temp1206478472/tmp-577036369" >> 2009-11-05 10:07:09,267 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records >> written : 3 >> 2009-11-05 10:07:09,267 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes >> written : 0 >> 2009-11-05 10:07:09,267 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-11-05 10:07:09,267 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! >> (6L) >> (7L) >> (15L) >> grunt> >> >> Initially I thought that such a loop would work >> >> static protected Long sum(Tuple input) throws ExecException, >> NumberFormatException { >> long sum = 0; >> >> List<Object> values = input.getAll(); // Would give all fields in Tuple?? >> for (Iterator<Object> it = values.iterator(); it.hasNext();) { >> sum += (Long)t; >> } >> return sum; >> } >> >> But I get an error that Tuple can't be cast back to Long. So my question is >> that what is input.getAll() returning? What is the structure of data that >> gets passed to exec function? >> >> Thanks! >> >> >> > > >
