John, I think this is the translation of DAG http://en.wikipedia.org/wiki/Directed_acyclic_graph Anyway, what I meant was the list of the generated MR jobs. When you launch a Pig script via command line you get something like this: INFO... job url... http://yourcluster:...jobid every time an MR job is launched.
Then, when the job is finished, you get the full list of jobid's, something like: Job DAG: job_201304081613_0032 -> job_201304081613_0033, job_201304081613_0033 -> job_201304081613_0034, job_201304081613_0034 -> ... Let me know if you have further questions On Wed, Jun 5, 2013 at 2:29 PM, John Meek <[email protected]> wrote: > hi Ruslan , > Not sure how to do this? Can you be specific?? Whats DAG? Thanks. > > > > > > > > -----Original Message----- > From: Ruslan Al-Fakikh <[email protected]> > To: user <[email protected]> > Sent: Wed, Jun 5, 2013 4:04 am > Subject: Re: Tracking parts of a job taking the most time > > > Hi! > > You can look at the Pig script stats after the script is finished. There is > a DAG of MR jobs there. You can look at the individual MR jobs' stats to > see how much time each MR job takes > > Ruslan > > > On Wed, Jun 5, 2013 at 10:15 AM, Johnny Zhang <[email protected]> > wrote: > > > How about disable multi-query execution and use UDF CurrentTime to print > > time between each script block? > > > > Johnny > > > > > > On Tue, Jun 4, 2013 at 7:11 PM, John Meek <[email protected]> wrote: > > > > > All, > > > > > > I have a 400 line pig script which perfoems the calculations I need it > to > > > perform, however I need to figure out the amount of time that specific > > > parts of the script take. > > > > > > For example, initial load from a Hbase table - id like to know how much > > > time the load takes before moving onto the next step. > > > > > > Whats the easiest way to break this down? > > > > > > > > > thanks, > > > JM > > > > > > > >
