I could, but then I would not be able to use a FilterFunc on the Bag..

(e.g. get all the people, that have read "xyz")

I would either have to flatten the bag and then filter or wrap the bag using 
another tuple.
Both seems to be unnecessary overhead.

Is my thinking correct?

Best,
Will
-----Original Message-----
From: Mridul Muralidharan [mailto:[email protected]] 
Sent: Thursday, March 10, 2011 2:08 AM
To: [email protected]
Cc: Lai Will
Subject: Re: Schema


In which case, cant you not model that as a Bag ?
I imagine something like Tuple with fields person:chararray, books_read:bag{ 
(name:chararray, isbn:chararray) }, etc ?

Ofcourse, it will work as a bag if the tuple contained within it has a fixed 
schema :-) (unless you repeat this process N number of times as required !)

Regards,
Mridul

On Wednesday 09 March 2011 10:46 PM, Lai Will wrote:
> It's the latter..
>
> You can imagine my EvalFunc as
> ArrayList<String>  booksRead(Person p) {}
>
> So for a list of people I get a List of ArrayList<String>  of different 
> lengths..
>
> -----Original Message-----
> From: Jonathan Coveney [mailto:[email protected]]
> Sent: Wednesday, March 09, 2011 6:12 PM
> To: [email protected]
> Subject: Re: Schema
>
> In any given instance will the size of the tuple change, or will it change on 
> a row by row basis? If it's the former, you can have a constructor that 
> indicates how many arguments, and the outputSchema can use that.
>
> Barring that, it is "good practice" to do so, but it's not necessary. Your 
> script will work without it, but DESCRIBES will get thrown off.
>
> 2011/3/9 Lai Will<[email protected]>
>
>> Hello,
>>
>> I read that it is good practice to declare the schema in Pig Script 
>> as well as in the UDF (by implementing outputSchema), because of 
>> performance reasons.
>>
>> Now in my case I have a EvalFunc that takes a chararray as input and 
>> produces a tuple with a dynamic number of chararrays (it creates it 
>> result by .newTuple(List list)).
>> How can I specify a schema for an unknown number of elements?
>>
>> Best,
>> Will
>>

Reply via email to