There is a TOTUPLE built in udf that will do the trick. But I suspect you will 
find manipulating fields inside a Tuple to be even more cumbersome. Most of our 
production scripts do the opposite -- loaders generate complex structures, and 
pig scripts explicitly pull out the fields they want and deal with flattened 
rows whenever possible. 

On Jan 11, 2012, at 3:33 AM, Ranjan Bagchi <[email protected]> wrote:

> Hi,
> 
> I'm doing some log processing in pig where I extract the typical apache log
> fields, filter, do some transformations, and then write the processed data
> to a file.
> 
> I'm finding maintaining the list of extracted fields to be somewhat
> cumbersome, though (and using * too sloppy for a maintainable script), and
> I'm wondering if I can package/extract them in a tuple.
> 
> So where I'm doing:
> 
> register file:/home/hadoop/lib/pig/piggybank.jar
> logs = load '$input' USING LogLoader as (remoteAddr, remoteLogname, user,
> time :chararray, method, uri :chararray, proto, status, bytes, referer,
> userAgent);
> 
> and logs has 11 fields.
> 
> could I do something like
> register file:/home/hadoop/lib/pig/piggybank.jar
> logs = load '$input' USING LogLoader as (remoteAddr, remoteLogname, user,
> time :chararray, method, uri :chararray, proto, status, bytes, referer,
> userAgent);
> tupled_logs = remoteAddr.. as apache:tuple;
> 
> in which tupled_logs only has one field.
> I've tried this, but I haven't found the magic words yet.
> 
> Thanks,
> 
> Ranjan

Reply via email to