Hey

I am trying to extract performance metrics from some of my logs using Pig
and have come up with the following. I feel like I might be performing one
too many steps and was wondering if there is a way to reduce the number of
FILTER/FOREACH operations I need to run. Still trying to learn the proper
syntax.

uniqLogs = FOREACH logs GENERATE host as host:CHARARRAY, body as
body:CHARARRAY;
metricLogLine = FILTER uniqLogs BY (body MATCHES
'.*gr.perf.metrics.Category.*');
metricLogData = FOREACH metricLogLine GENERATE host,
REGEX_EXTRACT_ALL(body,
'.*gr.perf.metrics.Category\\s*\\-\\s*([A-Za-z\\.\\_]+)\\s+([A-Za-z\\_\\.]+)')
AS regex;
fltrdMetricLogData = FILTER metricLogData BY regex is not null;
eventCategories = FOREACH fltrdMetricLogData GENERATE host, FLATTEN(regex)
AS (category:CHARARRAY, event:CHARARRAY);

Thanks

Reply via email to