Am 12.03.11 01:20, schrieb Dave Love:
Fritz Ferstl<ffer...@univa.com> writes:It's something that definitely needs to be added to Grid Engine (as a built-in feature) and we'll do that sooner rather than later but it would be a nice little project for community members who have not done code contributions thus far (hint! ;-)).Would there be advice available for anyone wanting to do it (hint! :-))?
You bet!It should actually be quite easy. In a first implementation you'll probably want to introduce a qmaster_param "black_hole_exit_rate" and then keep a statistic of the exit frequency rate for each exec host near the code where qmaster receives job completion information. Qmaster would compare the exit frequency of the hosts against black_hole_exit_rate and disable a host if its exit frequency is higher than allowed by black_hole_exit_rate.
A more advanced implementation would provide a black_hole_exit_rate per exec host or even per cluster queue (i.e. per job class) plus per exec host. The checking itself won't get much more complicated. The problem with the more advanced approach is only that you'd have to modify the format of the queue and host configuration. This would make that version incompatible with earlier versions. So the upgrade step will get more "involved".
We can certainly help with pointing to pertinent code sections if there are any takers.
Cheers, Fritz --------------------------------------------------------------------- Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system. ---------------------------------------------------------------------
<<attachment: fferstl.vcf>>
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users