Uhhhmmm ... we have
customers running 30 million jobs per month through their clusters
regularly. So what you are describing is not a particularly challenging
workload although it still will require careful tuning and you'll fair
better if you use the latest and greatest ... I know I'm telling you no
news that it is easy to shoot yourself into the foot with any kind of
workload management tuning ;-) A lot depends on cluster size, job
profiles, policy complexity and such ...
Cheers,
Fritz
ChrisDag schrieb:
Hi folks,
Thanks to a 20GB accounting file with 73 million entries that I was able
to push through the S-GAE webapp (a process worthy of a writeup/blog
post of it's own ...) I've got some interesting multi-year data about a
cluster that is ready to have another round of tuning and optimization.
The most interesting data bits:
- Millions of jobs per month. Average 1.5m or so but we saw as high as
2.6 million jobs in one month
- Average job duration is incredibly short - looks like average
execution time is 40-50 seconds
Gut feeling is that the first thing this cluster will need is a
reinstall so that we can tune the schedule into "scheduling on demand"
mode. However it's been a few years since I seriously had to deal with a
system running at this job throughput rate.
Has anything changed with respect to the current state of the art? I'm
thinking as a base line:
- Reinstall so we can set scheduling on demand behavior for the
scheduler
- Force local spooling and switch to binary if they are using classic
mode
- Strongly work with users/developers to increase average job duration
-dag
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
--
Fritz Ferstl | CTO and Business Development, EMEA Univa
Corporation | The Data Center Optimization Company E-Mail:
[email protected] | Phone: +49.9471.200.195 | Mobile:
+49.170.819.7390

|
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users