I don't think you need to do this with multiple invocations of the UDF, but I'm not sure... but I mean, if you did something like the following:
a = load 'thing' as (x:int); b = foreach (group a all) generate flatten(GenIdNumber(a.x)); then your output would be a bunch of ID numbers, and it would all be based on one invocation. The caveat is that it wouldn't be terribly efficient...for efficiency, you'd want to make this Algebraic, however, then you're going to have a bunch of invocations. Generally, you can't make any assumptions about how many invocations there are going to be because this is M/R, so you don't know how it's going to be split up. So once again, it depends on what you're trying to do specifically.. my guess is that there is an efficient way to achieve it that doesn't explicitly need one global invocation (and you can see why that would go against distributed processing as a paradigm). 2012/2/22 Shibu Thomas <[email protected]> > Hi Jonathan, > > The PIG UDF will generate a block of sequence numbers. > The parent PIG script will call this UDF in a foreach statement and the > UDF has to return the next number from the sequence. > > Thanks > > Shibu Thomas > MSCIS-IS > Office : +91 (40) 669 32660 > Mobile: +91 95811 51116 > > > -----Original Message----- > From: Jonathan Coveney [mailto:[email protected]] > Sent: Thursday, February 23, 2012 10:46 AM > To: [email protected] > Subject: Re: Retaining state in PIG UDF > > You need to be clearer about what you hope to achieve > > 2012/2/22 Shibu Thomas <[email protected]> > > > Hi, > > > > Is there any mechanism of retaining state between PIG UDF invocations? > > > > Thanks > > > > Shibu Thomas > > MSCIS-IS > > Office : +91 (40) 669 32660 > > Mobile: +91 95811 51116 > > > > >
