Cool thanks

On Wed, Nov 2, 2011 at 1:06 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Just to be explicit:
>
> This:
>
> x = FILTER something by num1 > 10 AND num2 < 12;
>
> is equivalent to this:
>
> x = FILTER something by num1 > 10;
> x = FILTER x by num2 < 12;
>
> All non-blocking operators are evaluated in a streaming fashion, so you
> don't need to worry about combining them into a single operator.
>
> On Wed, Nov 2, 2011 at 10:56 AM, Ashutosh Chauhan <[email protected]
> >wrote:
>
> > Hi Cameron,
> >
> > Your script looks alright. Each of your steps process data in different
> > ways. Instead of cramming together them in a single statement (possibly
> via
> > some custom UDF), it makes sense to have them in a series of steps as you
> > have done for better readability and debuggability. Are you worried about
> > performance? You need not to. As long as your operations don't introduce
> a
> > unnecessary map-reduce boundary (which your script doesn't) you are good.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Wed, Nov 2, 2011 at 10:17, Cameron Gandevia <[email protected]>
> > wrote:
> >
> > > Hey
> > >
> > > I am trying to extract performance metrics from some of my logs using
> Pig
> > > and have come up with the following. I feel like I might be performing
> > one
> > > too many steps and was wondering if there is a way to reduce the number
> > of
> > > FILTER/FOREACH operations I need to run. Still trying to learn the
> proper
> > > syntax.
> > >
> > > uniqLogs = FOREACH logs GENERATE host as host:CHARARRAY, body as
> > > body:CHARARRAY;
> > > metricLogLine = FILTER uniqLogs BY (body MATCHES
> > > '.*gr.perf.metrics.Category.*');
> > > metricLogData = FOREACH metricLogLine GENERATE host,
> > > REGEX_EXTRACT_ALL(body,
> > >
> > >
> >
> '.*gr.perf.metrics.Category\\s*\\-\\s*([A-Za-z\\.\\_]+)\\s+([A-Za-z\\_\\.]+)')
> > > AS regex;
> > > fltrdMetricLogData = FILTER metricLogData BY regex is not null;
> > > eventCategories = FOREACH fltrdMetricLogData GENERATE host,
> > FLATTEN(regex)
> > > AS (category:CHARARRAY, event:CHARARRAY);
> > >
> > > Thanks
> > >
> >
>



-- 
Thanks

Cameron Gandevia

Reply via email to