Thanks,

I was confused with the input to the exec method e.g. Tuple. Now I understand that each object in tuple could be of simple or complex type.

I have one more question though. The only way I was able to make my function work was:

grunt> ds = LOAD 'data/timeseries' using PigStorage('\t') as (times:double, mag1:double, err1:double, mag2:double, err2:double);
grunt>  A = group ds all;
grunt> B = foreach A {result = PeriodSearchFunc(ds); generate flatten(result);};

e.g. I was forced to wrap it in a Bag and then use foreach. Is it possible to use it as follows:


grunt> ds = LOAD 'data/timeseries' using PigStorage('\t') as (times:double, mag1:double, err1:double, mag2:double, err2:double);
grunt> B = PeriodSearchFunc(ds);

(in the same manner as the DISTINCT or COUNT built-ins)

thanks again


On May 29, 2010, at 3:01 AM, Dmitriy Ryaboy wrote:

Sounds like you want an EvalFunc that returns a Bag of Tuples, with each
tuple having 2 fields. Pretty straightforward.
You don't have to implement the algebraic interface (or the accumulator interface) -- those are optimizations for working with large datasets, and
not required for anything other than scalability.

(hc -- chickens won't come out cause pig won't know how to serialize the
thing. You have to turn your chicken into a bytearray).

-D


On Fri, May 28, 2010 at 5:29 PM, hc busy <[email protected]> wrote:

Couldn't you give EvalFunc<any return type> any return type? so you can
just
return a Bag that contains tuples of tuples, right? And it's easy because
tuple is un parameterized type, (and so is Bag) so you'd declare


class myUdf extends EvalFunc<Bag>{...}

I haven't tried this, but some times I'm tempted to return something weird
like

EvalFunc<Chicken>

and see chickens come out of pig. ;-) heheheheeee


Anyways, in all seriousness, there is a UDF that converts data to bag
(well,
currently a contrib Udf, but may make into bultin) that I wrote called
ToBag. here's the initial declaration for it:

public class ToBag extends EvalFunc<DataBag>


Your class would be declared similarly.

On Fri, May 28, 2010 at 7:50 AM, Asif Jan <[email protected]> wrote:

Hello

I need some help to get started with using Pig UDF.

I have time series data (time, magA, errA, magB, errB) e.g.

(2345.59777,19.875,0.481,20.225,0.482)
(2347.59568,19.371,0.3,20.227,0.743)
(2351.6075,19.063,0.193,20.768,1.085)
(2354.59702,20.689,3.047,20.873,1.758)
(2356.63223,21.23,3.341,20.562,1.242)


and I need to apply an algorithm that searches for periods in the data.
The input to the algorithm is the  (time , magX, errX )  arrays. The
algo
returns a List of all periods found. Each entry in the List is a
(period_value , period_significance) pair.


How can I wrap that algo as UDF ? do I have to use algebraic functions (but I saw that they could only return scalar values ); what I need to
return from function is something like

(1000.0,0.57)
(234, .45)
(100, 0.023)
(6, 0.003)


thanks a lot









Reply via email to