Hello list,

I order to learn about Hadoop performance tuning, I am currently investigating the effect of certain Hadoop configuration parameters on certain Hadoop counters. I would like to do something like the following (from the command line):

for some_config_parameter in set_of_config_values

  Step 1) run hadoop job with 'hadoop jar ....'

Step 2) once job finished, get the value of one or more Hadoop counters of this job

I know that I can achieve step 2 with the -counter option of the mapred job command:

bart@sandy-quad-1:~$ mapred job -counter
Usage: CLI [-counter <job-id> <group-name> <counter-name>]

However, I need to specify a job-id here, and that is where I'm having trouble... I don't know an easy way to get the job-id from the hadoop job that I started in Step 1. I also don't know of a way to specify a job-id myself in Step 1 so that I can use it later in Step 2.

I cannot imagine I'm the only one trying to run jobs and requesting some of the counters afterwards. How is this typically solved?

Note that I'm looking for a command-line solution, something that is scriptable bash or so.

Thanks,
Bart

Reply via email to