Hello list,
I order to learn about Hadoop performance tuning, I am currently
investigating the effect of certain Hadoop configuration parameters on
certain Hadoop counters. I would like to do something like the
following (from the command line):
for some_config_parameter in set_of_config_values
Step 1) run hadoop job with 'hadoop jar ....'
Step 2) once job finished, get the value of one or more Hadoop
counters of this job
I know that I can achieve step 2 with the -counter option of the mapred
job command:
bart@sandy-quad-1:~$ mapred job -counter
Usage: CLI [-counter <job-id> <group-name> <counter-name>]
However, I need to specify a job-id here, and that is where I'm having
trouble... I don't know an easy way to get the job-id from the hadoop
job that I started in Step 1. I also don't know of a way to specify a
job-id myself in Step 1 so that I can use it later in Step 2.
I cannot imagine I'm the only one trying to run jobs and requesting some
of the counters afterwards. How is this typically solved?
Note that I'm looking for a command-line solution, something that is
scriptable bash or so.
Thanks,
Bart