All arguments to funcs are automatically wrapped in a tuple anyway. so let's say we want to write a BagContains filter.
foo = group stuff by key; foreach foo generate ( BagContains(stuff, 'magic') ? 1 : 0); they you'd write BagContains to take a Tuple of 2 args -- the first field is a bag, the second is your predicate. Similarly you can filter by IsEmpty(myBag), etc. D On Fri, Mar 11, 2011 at 10:11 AM, Lai Will <[email protected]> wrote: > I could, but then I would not be able to use a FilterFunc on the Bag.. > > (e.g. get all the people, that have read "xyz") > > I would either have to flatten the bag and then filter or wrap the bag > using another tuple. > Both seems to be unnecessary overhead. > > Is my thinking correct? > > Best, > Will > -----Original Message----- > From: Mridul Muralidharan [mailto:[email protected]] > Sent: Thursday, March 10, 2011 2:08 AM > To: [email protected] > Cc: Lai Will > Subject: Re: Schema > > > In which case, cant you not model that as a Bag ? > I imagine something like Tuple with fields person:chararray, > books_read:bag{ (name:chararray, isbn:chararray) }, etc ? > > Ofcourse, it will work as a bag if the tuple contained within it has a > fixed schema :-) (unless you repeat this process N number of times as > required !) > > Regards, > Mridul > > On Wednesday 09 March 2011 10:46 PM, Lai Will wrote: > > It's the latter.. > > > > You can imagine my EvalFunc as > > ArrayList<String> booksRead(Person p) {} > > > > So for a list of people I get a List of ArrayList<String> of different > lengths.. > > > > -----Original Message----- > > From: Jonathan Coveney [mailto:[email protected]] > > Sent: Wednesday, March 09, 2011 6:12 PM > > To: [email protected] > > Subject: Re: Schema > > > > In any given instance will the size of the tuple change, or will it > change on a row by row basis? If it's the former, you can have a constructor > that indicates how many arguments, and the outputSchema can use that. > > > > Barring that, it is "good practice" to do so, but it's not necessary. > Your script will work without it, but DESCRIBES will get thrown off. > > > > 2011/3/9 Lai Will<[email protected]> > > > >> Hello, > >> > >> I read that it is good practice to declare the schema in Pig Script > >> as well as in the UDF (by implementing outputSchema), because of > >> performance reasons. > >> > >> Now in my case I have a EvalFunc that takes a chararray as input and > >> produces a tuple with a dynamic number of chararrays (it creates it > >> result by .newTuple(List list)). > >> How can I specify a schema for an unknown number of elements? > >> > >> Best, > >> Will > >> > >
