oh, that's a good point, can't just return arbitrary types... Even if I
derive from base class. Interesting.

Well, the combination of toTuple and toBag will accomplish many tasks. One
thing that I had to do is to collapse three columns into one row. (you won't
believe how many companies have legacy database like this or how much money
flows through this kind of systems out there. ;-)

So I do

FOREACH input_table GENERATE k0, f0, toBag(toTuple(k1, column_1),toTuple(k1,
column_2), toTuple(f1, column3));

And this get's me where I needed to be. It's similar to what asif was asking
except he wants to be doing more complicated combination inside his UDF. But
if it's time series, wouldn't we get where we need to be with a group and
order by?

I'd like to mention again, I'd really like to see nested foreach, group,
cross, and union be allowed into the set of nested_op inside foreach.


On Fri, May 28, 2010 at 5:29 PM, hc busy <[email protected]> wrote:

>
> Couldn't you give EvalFunc<any return type> any return type? so you can
> just return a Bag that contains tuples of tuples, right? And it's easy
> because tuple is un parameterized type, (and so is Bag) so you'd declare
>
>
> class myUdf extends EvalFunc<Bag>{...}
>
> I haven't tried this, but some times I'm tempted to return something weird
> like
>
> EvalFunc<Chicken>
>
> and see chickens come out of pig. ;-) heheheheeee
>
>
> Anyways, in all seriousness, there is a UDF that converts data to bag
> (well, currently a contrib Udf, but may make into bultin) that I wrote
> called ToBag. here's the initial declaration for it:
>
> public class ToBag extends EvalFunc<DataBag>
>
>
> Your class would be declared similarly.
>
> On Fri, May 28, 2010 at 7:50 AM, Asif Jan <[email protected]> wrote:
>
>> Hello
>>
>> I need some help to get started with using Pig UDF.
>>
>> I have time series data (time, magA, errA, magB, errB) e.g.
>>
>> (2345.59777,19.875,0.481,20.225,0.482)
>> (2347.59568,19.371,0.3,20.227,0.743)
>> (2351.6075,19.063,0.193,20.768,1.085)
>> (2354.59702,20.689,3.047,20.873,1.758)
>> (2356.63223,21.23,3.341,20.562,1.242)
>>
>>
>> and I need to apply an algorithm that searches for periods in the data.
>>  The input to the algorithm is the  (time , magX, errX )  arrays. The algo
>> returns a List of all periods found. Each entry in the List is a
>> (period_value , period_significance) pair.
>>
>>
>> How can I wrap that algo as UDF ?   do I have to use algebraic functions
>> (but I saw that they could only return scalar values ); what I need to
>> return from function is something like
>>
>> (1000.0,0.57)
>> (234, .45)
>> (100, 0.023)
>> (6, 0.003)
>>
>>
>> thanks a lot
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to