Awesome Alan, let me try that out and see if it works.

John

On Thu, Oct 28, 2010 at 11:49 AM, Alan Gates <[email protected]> wrote:

>
> On Oct 28, 2010, at 8:36 AM, John Hui wrote:
>
>  I look into the return data bag as an option.  The problem is the Loader
>> interface require me to return a Tuple object.
>>
>>  public Tuple getNext() throws IOException {
>>
>> but the DataBag interface is not a derive class of Tuple so this means I
>> will need to change the internal code for pig for my loader to return a
>> bag
>> of tuples.  Right?
>>
>
> No.  If at the end of your getNext() you have a List<Tuple> tuples, then
> return:
>
> return
> TupleFactory.getInstance().newTuple(BagFactory.getInstance().newDefaultBag(tuples));
>
> This will give you a tuple, which has a single field, which is a bag.
>  Within that bag will be all your tuples.  If your next Pig Latin statement
> is
>
> B = foreach A generate flatten($0);
>
> then B will contain each of your records as individual records.
>
> Alan.
>
>
>
>> John
>>
>> On Wed, Oct 27, 2010 at 6:00 PM, John Hui <[email protected]> wrote:
>>
>>  Hi Pig Users,
>>>
>>> I am currently writing a UDF loader.  In one of my use case, one line in
>>> the input stream results in multiple tuples.  Has anyone encounter or
>>> solve
>>> this issue on their end.
>>>
>>> The current structure of the code getNext method only return tuple but I
>>> want it to return a List<tuple>.  Let me know if there's use case out
>>> there
>>> like mine, I am coding it up to return List<tuple> which is more more
>>> flexible than return only one tuple.
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>

Reply via email to