Awesome Alan, let me try that out and see if it works. John
On Thu, Oct 28, 2010 at 11:49 AM, Alan Gates <[email protected]> wrote: > > On Oct 28, 2010, at 8:36 AM, John Hui wrote: > > I look into the return data bag as an option. The problem is the Loader >> interface require me to return a Tuple object. >> >> public Tuple getNext() throws IOException { >> >> but the DataBag interface is not a derive class of Tuple so this means I >> will need to change the internal code for pig for my loader to return a >> bag >> of tuples. Right? >> > > No. If at the end of your getNext() you have a List<Tuple> tuples, then > return: > > return > TupleFactory.getInstance().newTuple(BagFactory.getInstance().newDefaultBag(tuples)); > > This will give you a tuple, which has a single field, which is a bag. > Within that bag will be all your tuples. If your next Pig Latin statement > is > > B = foreach A generate flatten($0); > > then B will contain each of your records as individual records. > > Alan. > > > >> John >> >> On Wed, Oct 27, 2010 at 6:00 PM, John Hui <[email protected]> wrote: >> >> Hi Pig Users, >>> >>> I am currently writing a UDF loader. In one of my use case, one line in >>> the input stream results in multiple tuples. Has anyone encounter or >>> solve >>> this issue on their end. >>> >>> The current structure of the code getNext method only return tuple but I >>> want it to return a List<tuple>. Let me know if there's use case out >>> there >>> like mine, I am coding it up to return List<tuple> which is more more >>> flexible than return only one tuple. >>> >>> Thanks, >>> >>> John >>> >>> >
