Jonathan, can't you just pass the bag A in? On Mon, Jan 10, 2011 at 9:56 AM, Jonathan Coveney <[email protected]>wrote:
> So I have a udf, let's call it myudf.bag2bag, which takes a bag which > contains "prop," and creates a new bag of tuples based on that. > > I have data in the form of > > id prop other1 other2 > > If all I care about is running the udf, obviously I can do > > A = LOAD 'file' AS (id, prop, other1, other2); > B = GROUP A BY id; > C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop)); > > And all is fine > > But what do I do if I want to hold on to the other data, especially if you > don't know how much there will be (from a bag2bag perspective) > > My thought is that in bag2bag, you can pass in a touple of "extras," which > you then pass back, ie > > C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop, (A,other1, > A.other2)))); > > I'm just not sure how I would specify the schema for this, in such a way > that any number of entries could be in the tuple, and then you could just > sort of reference them later. > > Is this possible? >
