Alan means return a tuple of a single bag of many tuples (don't try to make pig work with a loader that returns a bag instead of a tuple.. you'll be up to your neck in the visitor pattern in no time if you start heading that direction).
Alternative is to change what constitutes a record your loader gets -- use a different inputformat/recordReader to produce the records as needed, instead of feeding you lines. -D On Thu, Oct 28, 2010 at 8:36 AM, John Hui <[email protected]> wrote: > I look into the return data bag as an option. The problem is the Loader > interface require me to return a Tuple object. > > public Tuple getNext() throws IOException { > > but the DataBag interface is not a derive class of Tuple so this means I > will need to change the internal code for pig for my loader to return a bag > of tuples. Right? > > John > > On Wed, Oct 27, 2010 at 6:00 PM, John Hui <[email protected]> wrote: > >> Hi Pig Users, >> >> I am currently writing a UDF loader. In one of my use case, one line in >> the input stream results in multiple tuples. Has anyone encounter or solve >> this issue on their end. >> >> The current structure of the code getNext method only return tuple but I >> want it to return a List<tuple>. Let me know if there's use case out there >> like mine, I am coding it up to return List<tuple> which is more more >> flexible than return only one tuple. >> >> Thanks, >> >> John >> >
