Re: Performing multiple reductions from a single map job

Johnny Zhang Mon, 11 Mar 2013 11:12:44 -0700

Hi, Benjamin:
You can put all your commands in one script.pig file and try to run: pig -x
mapreduce -e 'explain -script script.pig'
It will explain the entire flow.


Johnny


On Mon, Mar 11, 2013 at 8:29 AM, Benjamin Smedberg <[email protected]>wrote:

> I'm working on a crash processing system and trying to group large amounts
> of data on multiple facets. Loading the data can be expensive, so I'd
> really like to use a single map job. I understand that multi-query
> execution in theory allows for multiple STORE commands to come from a
> single map execution. Is there a way to EXPLAIN the plan of an entire pig
> script that has multiple STORE commands, to tell how it's going to run
> mapreduce? I can only see a way to run EXPLAIN on a single relation, which
> shows a single mapreduce but doesn't really tell how they might be combined
> with multiquery execution. I'm trying to figure out whether pig will use a
> single map for the following pig statement, or whether there is a way to
> make it use a single map.
>
> raw = LOAD ...;
> processed = FOREACH raw GENERATE uuid, signature, AdapterVendorID,
> ExtensionsInstalled, ModulesLoaded; /* UDFs process the raw data into these
> fields */
> filtered = FILTERED processed BY some conditions here;
>
> bygraphicsvendor = GROUP filtered BY (signature, AdapterVendorID);
> byvendortotals = FOREACH bygraphicsvendor GENERATE group.signature,
> group.AdapterVendorID, COUNT(filtered) AS c;
>
> STORE byvendortotals INTO ....;
>
> withextensions = FOREACH filtered GENERATE signature,
> flatten(ExtensionsInstalled);
> byextension = GROUP withextensions BY (signature, extensionID);
> byextensiontotals = FOREACH byextension GENERATE group.signature,
> group.extensionID, COUNT(withextensions) AS c;
>
> STORE byextensiontotals INTO ...;
>
> --BDS
>
>

Re: Performing multiple reductions from a single map job

Reply via email to