Jonathan, can't you just pass the bag A in?

On Mon, Jan 10, 2011 at 9:56 AM, Jonathan Coveney <[email protected]>wrote:

> So I have a udf, let's call it myudf.bag2bag, which takes a bag which
> contains "prop," and creates a new bag of tuples based on that.
>
> I have data in the form of
>
> id    prop    other1    other2
>
> If all I care about is running the udf, obviously I can do
>
> A = LOAD 'file' AS (id, prop, other1, other2);
> B = GROUP A BY id;
> C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop));
>
> And all is fine
>
> But what do I do if I want to hold on to the other data, especially if you
> don't know how much there will be (from a bag2bag perspective)
>
> My thought is that in bag2bag, you can pass in a touple of "extras," which
> you then pass back, ie
>
> C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop, (A,other1,
> A.other2))));
>
> I'm just not sure how I would specify the schema for this, in such a way
> that any number of entries could be in the tuple, and then you could just
> sort of reference them later.
>
> Is this possible?
>

Reply via email to