Hi, Benjamin: You can put all your commands in one script.pig file and try to run: pig -x mapreduce -e 'explain -script script.pig' It will explain the entire flow.
Johnny On Mon, Mar 11, 2013 at 8:29 AM, Benjamin Smedberg <[email protected]>wrote: > I'm working on a crash processing system and trying to group large amounts > of data on multiple facets. Loading the data can be expensive, so I'd > really like to use a single map job. I understand that multi-query > execution in theory allows for multiple STORE commands to come from a > single map execution. Is there a way to EXPLAIN the plan of an entire pig > script that has multiple STORE commands, to tell how it's going to run > mapreduce? I can only see a way to run EXPLAIN on a single relation, which > shows a single mapreduce but doesn't really tell how they might be combined > with multiquery execution. I'm trying to figure out whether pig will use a > single map for the following pig statement, or whether there is a way to > make it use a single map. > > raw = LOAD ...; > processed = FOREACH raw GENERATE uuid, signature, AdapterVendorID, > ExtensionsInstalled, ModulesLoaded; /* UDFs process the raw data into these > fields */ > filtered = FILTERED processed BY some conditions here; > > bygraphicsvendor = GROUP filtered BY (signature, AdapterVendorID); > byvendortotals = FOREACH bygraphicsvendor GENERATE group.signature, > group.AdapterVendorID, COUNT(filtered) AS c; > > STORE byvendortotals INTO ....; > > withextensions = FOREACH filtered GENERATE signature, > flatten(ExtensionsInstalled); > byextension = GROUP withextensions BY (signature, extensionID); > byextensiontotals = FOREACH byextension GENERATE group.signature, > group.extensionID, COUNT(withextensions) AS c; > > STORE byextensiontotals INTO ...; > > --BDS > >
