In which case, cant you not model that as a Bag ?
I imagine something like Tuple with fields person:chararray, books_read:bag{ (name:chararray, isbn:chararray) }, etc ?

Ofcourse, it will work as a bag if the tuple contained within it has a fixed schema :-) (unless you repeat this process N number of times as required !)

Regards,
Mridul

On Wednesday 09 March 2011 10:46 PM, Lai Will wrote:
It's the latter..

You can imagine my EvalFunc as
ArrayList<String>  booksRead(Person p) {}

So for a list of people I get a List of ArrayList<String>  of different 
lengths..

-----Original Message-----
From: Jonathan Coveney [mailto:[email protected]]
Sent: Wednesday, March 09, 2011 6:12 PM
To: [email protected]
Subject: Re: Schema

In any given instance will the size of the tuple change, or will it change on a 
row by row basis? If it's the former, you can have a constructor that indicates 
how many arguments, and the outputSchema can use that.

Barring that, it is "good practice" to do so, but it's not necessary. Your 
script will work without it, but DESCRIBES will get thrown off.

2011/3/9 Lai Will<[email protected]>

Hello,

I read that it is good practice to declare the schema in Pig Script as
well as in the UDF (by implementing outputSchema), because of
performance reasons.

Now in my case I have a EvalFunc that takes a chararray as input and
produces a tuple with a dynamic number of chararrays (it creates it
result by .newTuple(List list)).
How can I specify a schema for an unknown number of elements?

Best,
Will


Reply via email to