Sounds like you want an EvalFunc that returns a Bag of Tuples, with each tuple having 2 fields. Pretty straightforward. You don't have to implement the algebraic interface (or the accumulator interface) -- those are optimizations for working with large datasets, and not required for anything other than scalability.
(hc -- chickens won't come out cause pig won't know how to serialize the thing. You have to turn your chicken into a bytearray). -D On Fri, May 28, 2010 at 5:29 PM, hc busy <[email protected]> wrote: > Couldn't you give EvalFunc<any return type> any return type? so you can > just > return a Bag that contains tuples of tuples, right? And it's easy because > tuple is un parameterized type, (and so is Bag) so you'd declare > > > class myUdf extends EvalFunc<Bag>{...} > > I haven't tried this, but some times I'm tempted to return something weird > like > > EvalFunc<Chicken> > > and see chickens come out of pig. ;-) heheheheeee > > > Anyways, in all seriousness, there is a UDF that converts data to bag > (well, > currently a contrib Udf, but may make into bultin) that I wrote called > ToBag. here's the initial declaration for it: > > public class ToBag extends EvalFunc<DataBag> > > > Your class would be declared similarly. > > On Fri, May 28, 2010 at 7:50 AM, Asif Jan <[email protected]> wrote: > > > Hello > > > > I need some help to get started with using Pig UDF. > > > > I have time series data (time, magA, errA, magB, errB) e.g. > > > > (2345.59777,19.875,0.481,20.225,0.482) > > (2347.59568,19.371,0.3,20.227,0.743) > > (2351.6075,19.063,0.193,20.768,1.085) > > (2354.59702,20.689,3.047,20.873,1.758) > > (2356.63223,21.23,3.341,20.562,1.242) > > > > > > and I need to apply an algorithm that searches for periods in the data. > > The input to the algorithm is the (time , magX, errX ) arrays. The > algo > > returns a List of all periods found. Each entry in the List is a > > (period_value , period_significance) pair. > > > > > > How can I wrap that algo as UDF ? do I have to use algebraic functions > > (but I saw that they could only return scalar values ); what I need to > > return from function is something like > > > > (1000.0,0.57) > > (234, .45) > > (100, 0.023) > > (6, 0.003) > > > > > > thanks a lot > > > > > > > > > > > > > > >
