Re: [gridengine users] SGE Benchmark Tools

Reuti Thu, 17 Feb 2011 03:25:57 -0800

Am 17.02.2011 um 09:07 schrieb Fritz Ferstl:

> My 2 cents here (and I'm aware they will not help Eric ... and apologies in 
> advance for the rant, it's a long-term heartfelt topic  ...):
> 
> A DRM benchmark would be nice to have. Benchmarking in general is an almost 
> vain attempt. You have to be very prescriptive of the boundary conditions to 
> achieve comparable results. And such narrow boundary conditions almost never 
> can reflect reality. So all benchmarks are up for interpretation and up for 
> debate.
> 
> But, benchmarks are still a useful means to provide at least some 
> orientation. As Chris has stated, the variability in the use case scenarios 
> of workload managers is certainly even bigger than in classical performance 
> benchmarks such as SPEC or Linpack. You also have to be careful what you are 
> measuring: the underlying HW, network & storage performance? Or the 
> efficiency of the SW? Or the ability to tune the workload management system - 
> in itself and in combination with HW & SW underneath? Or the suitability of 
> the workload management system for a specific application case?
> 
> So I guess that probably a suite of benchmarks would be needed, maybe akin to 
> SPEC, to provide at least a roughly representative picture. And you'd have to 
> either standardize on the HW, e.g. take 100 Amazon dedicated servers and run 
> with that, or you'd have to do it like for Linpack and say: "I don't care 
> what you use and how much of it but report the resulting throughput vs time 
> numbers on these use cases." I.e. how fast can you possibly get. In other 
> words something like the Top500 for workload management environments.
> 
> For many companies and institutions workload managers have become the most 
> central workhorse - the conveyor belt of a data center. If it stops, all 
> stops. If you can make it run quicker you achieve your results sooner. If it 
> enables you to do so you can be much more flexible in responding to changing 
> demands. So it's almost ironic that large computing centers are benchmarking 
> individual server performance, run something like Linpack to advertise their 
> peak performance and create their own site-specific application benchmark 
> suites for selecting new HW. But they often do not benchmark with the 
> workload management system in the picture which later, in combination with 
> tuning and the rest of the environment, will define the efficiency of the 
> data center.
> 
> So a benchmark for DRMs would be a highly useful tool. I've always wondered 
> how to get an initiative started which would lead to such a benchmark ...

The question is: what to measure in a queuing system? This could be:

a) - The load on the qmaster machine to handle the submitted jobs (used disk, 
memory, cpu-time).

b) - Time vasted to switch from one job to the next one on an exechost (maybe 
already sending the next job thereto in a "ready" state beforehand and release 
it as soon as the former job finished). This would be interesting for workflows 
where many short jobs are submitted.

c) - Time vasted by resource reservation. It was a couple of times on the 
former list to have some kind of real-time features in SGE to know exactly when 
a jobs starts and will end (assuming you know the necessary execution times 
beforehand). In this context also solving a cutting-stock-problem would fit (to 
minimize the overall wallclock time): having a bunch of jobs, is the queuing 
system capable of reordering them in such a way, that the resource reservation 
vasted is at its minimum (possibly zero) and to finish the complete bunch of 
jobs in the shortest wallclock time possible?

d) - Can I tell the scheduler that I have varying resource requirements over 
the runtime of the job to lower vasted resources? Can the queuing system send 
my job around in the cluster depending on the resources need in certains steps 
of a job? All to minimize the unused resources.

e) - Can c) be combined with some job dependencies in a directed graph (a job 
with error shouldn't trigger the next with a job-hold to start) and decisions 
which job of two alternatives should be executed at all? (sadly the project 
http://wildfire.bii.a-star.edu.sg/ stopped)

Somehow I have the impression, that there is no real benchmark possible in a 
sense of "always faster" and next year "faster again". This would end up with 
the criticism you state about a plain Linpack test.

It's more like a conformance test according to certain set up rules. Imagine 
the case you have queuing system A which can reorder jobs and minimize vasted 
resources (but has a high impact on the qmaster machine), and queuing system B 
which puts nearly no load on the qmaster machine (but has just a FIFO for 
handling jobs). Which one would you buy?

NB: for us in quantum chemistry, where jobs run for weeks or even months, a) 
and b) isn't much of a concern. c) would be interesting though. d) would be 
hard to phrase in exact times.

-- Reuti

> Any ideas?
> 
> Cheers,
> 
> Fritz
> 
> 
> Am 16.02.11 22:38, schrieb Chris Dagdigian:
>> 
>> What exactly are you trying to benchmark? Job types and workflows are
>> far to variable to produce a usable generic reference.
>> 
>> The real benchmark is "does it do what I need?" and there are many
>> people on this list who can help you zero in on answering that question.
>> 
>> SGE is used on anything from single-node servers to the 60,000+ CPU
>> cores on the RANGER cluster over at TACC.
>> 
>> The devil is in the details of what you are trying to do of course!
>> 
>> -Chris
>> 
>> 
>> 
>> Eric Kaufmann wrote:
>>> I am fairly new to SGE. I am interested in getting some benchmark
>>> information from SGE.
>>> 
>>> Are there any tools for this etc?
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> ---------------------------------------------------------------------
> 
> 
> Notice from Univa Postmaster:
> 
> 
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential and privileged information. Any unauthorized review, 
> use, disclosure or distribution is prohibited. If you are not the intended 
> recipient, please contact the sender by reply email and destroy all copies of 
> the original message. This message has been content scanned by the Univa Mail 
> system.
> 
> 
> 
> ---------------------------------------------------------------------
> 
> <fferstl.vcf>_______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE Benchmark Tools

Reply via email to