Hi,

I'm doing some log processing in pig where I extract the typical apache log
fields, filter, do some transformations, and then write the processed data
to a file.

I'm finding maintaining the list of extracted fields to be somewhat
cumbersome, though (and using * too sloppy for a maintainable script), and
I'm wondering if I can package/extract them in a tuple.

So where I'm doing:

register file:/home/hadoop/lib/pig/piggybank.jar
logs = load '$input' USING LogLoader as (remoteAddr, remoteLogname, user,
time :chararray, method, uri :chararray, proto, status, bytes, referer,
userAgent);

and logs has 11 fields.

could I do something like
register file:/home/hadoop/lib/pig/piggybank.jar
logs = load '$input' USING LogLoader as (remoteAddr, remoteLogname, user,
time :chararray, method, uri :chararray, proto, status, bytes, referer,
userAgent);
tupled_logs = remoteAddr.. as apache:tuple;

in which tupled_logs only has one field.
I've tried this, but I haven't found the magic words yet.

Thanks,

Ranjan

Reply via email to