There is a TOTUPLE built in udf that will do the trick. But I suspect you will find manipulating fields inside a Tuple to be even more cumbersome. Most of our production scripts do the opposite -- loaders generate complex structures, and pig scripts explicitly pull out the fields they want and deal with flattened rows whenever possible.
On Jan 11, 2012, at 3:33 AM, Ranjan Bagchi <[email protected]> wrote: > Hi, > > I'm doing some log processing in pig where I extract the typical apache log > fields, filter, do some transformations, and then write the processed data > to a file. > > I'm finding maintaining the list of extracted fields to be somewhat > cumbersome, though (and using * too sloppy for a maintainable script), and > I'm wondering if I can package/extract them in a tuple. > > So where I'm doing: > > register file:/home/hadoop/lib/pig/piggybank.jar > logs = load '$input' USING LogLoader as (remoteAddr, remoteLogname, user, > time :chararray, method, uri :chararray, proto, status, bytes, referer, > userAgent); > > and logs has 11 fields. > > could I do something like > register file:/home/hadoop/lib/pig/piggybank.jar > logs = load '$input' USING LogLoader as (remoteAddr, remoteLogname, user, > time :chararray, method, uri :chararray, proto, status, bytes, referer, > userAgent); > tupled_logs = remoteAddr.. as apache:tuple; > > in which tupled_logs only has one field. > I've tried this, but I haven't found the magic words yet. > > Thanks, > > Ranjan
