Hi,
I've never seen this but I would start with:
1) strace qmaster during restart to try to see at which point it is dying (e.g.,
loading a config file)
2) look for any reference to the name of the host you deleted in the spool
area and do some cleanup
3) clean out the jobs spool area
HTH,
John
, Marshall2, John (SSC/SPC) wrote:
Hi,
When gridengine calculates cpu usage (based on wallclock) it uses:
cpu usage = wallclock * nslots
This does not account for the number of cpus that may be used for
each slot, which is problematic.
I have written up an article at:
https
container with its own
contstraints, networking, etc.
Where we do the above, the cpus are dedicated. So, there is no
overallocation. The cpus are either available or not.
John
On Fri, 2018-08-31 at 12:58 +0200, Reuti wrote:
Hi John,
Am 31.08.2018 um 12:27 schrieb Marshall2, John (SSC/SPC
Hi,
When gridengine calculates cpu usage (based on wallclock) it uses:
cpu usage = wallclock * nslots
This does not account for the number of cpus that may be used for
each slot, which is problematic.
I have written up an article at: