All arguments to funcs are automatically wrapped in a tuple anyway.

so let's say we want to write a BagContains filter.

foo = group stuff by key;
foreach foo generate ( BagContains(stuff, 'magic') ? 1 : 0);

they you'd write BagContains to take a Tuple of 2 args -- the first field is
a bag, the second is your predicate.

Similarly you can filter by IsEmpty(myBag), etc.

D

On Fri, Mar 11, 2011 at 10:11 AM, Lai Will <[email protected]> wrote:

> I could, but then I would not be able to use a FilterFunc on the Bag..
>
> (e.g. get all the people, that have read "xyz")
>
> I would either have to flatten the bag and then filter or wrap the bag
> using another tuple.
> Both seems to be unnecessary overhead.
>
> Is my thinking correct?
>
> Best,
> Will
> -----Original Message-----
> From: Mridul Muralidharan [mailto:[email protected]]
> Sent: Thursday, March 10, 2011 2:08 AM
> To: [email protected]
> Cc: Lai Will
> Subject: Re: Schema
>
>
> In which case, cant you not model that as a Bag ?
> I imagine something like Tuple with fields person:chararray,
> books_read:bag{ (name:chararray, isbn:chararray) }, etc ?
>
> Ofcourse, it will work as a bag if the tuple contained within it has a
> fixed schema :-) (unless you repeat this process N number of times as
> required !)
>
> Regards,
> Mridul
>
> On Wednesday 09 March 2011 10:46 PM, Lai Will wrote:
> > It's the latter..
> >
> > You can imagine my EvalFunc as
> > ArrayList<String>  booksRead(Person p) {}
> >
> > So for a list of people I get a List of ArrayList<String>  of different
> lengths..
> >
> > -----Original Message-----
> > From: Jonathan Coveney [mailto:[email protected]]
> > Sent: Wednesday, March 09, 2011 6:12 PM
> > To: [email protected]
> > Subject: Re: Schema
> >
> > In any given instance will the size of the tuple change, or will it
> change on a row by row basis? If it's the former, you can have a constructor
> that indicates how many arguments, and the outputSchema can use that.
> >
> > Barring that, it is "good practice" to do so, but it's not necessary.
> Your script will work without it, but DESCRIBES will get thrown off.
> >
> > 2011/3/9 Lai Will<[email protected]>
> >
> >> Hello,
> >>
> >> I read that it is good practice to declare the schema in Pig Script
> >> as well as in the UDF (by implementing outputSchema), because of
> >> performance reasons.
> >>
> >> Now in my case I have a EvalFunc that takes a chararray as input and
> >> produces a tuple with a dynamic number of chararrays (it creates it
> >> result by .newTuple(List list)).
> >> How can I specify a schema for an unknown number of elements?
> >>
> >> Best,
> >> Will
> >>
>
>

Reply via email to