(anonymous) wrote:

>> with gridengine-master 6.2u5-7.3 (Ubuntu Trusty), our
>> /var/lib/gridengine/spool/qmaster/messages gets constantly
>> filled with:

>> | 12/07/2016 04:11:43|worker|tools-grid-master|E|got load report of unknown 
>> exec host "tools-exec-1204.eqiad.wmflabs"

>> (tools-exec-1204.eqiad.wmflabs is a host that no longer
>> exists.)

>> How can I convince the grid master to "move on",
>> i. e. "accept" that it did receive a load report from an
>> unknown host, or "delete" the load report from its inbox?

> Do you have any custom load sensors defined, either on a
> global or local level per exechost? The machine in question
> was completely removed and shut down?

I don't think we have any custom load sensors defined, but
your latter question caused me reconsider the facts: The
host was shut down, removed from DNS and an entry for that
host removed from
/var/lib/gridengine/default/common/host_aliases, /but/ the
grid master had not been restarted afterwards, i. e. it was
still working with the old host_aliases that had an entry
for that host.  After "service gridengine-master restart",
the error no longer shows up in
/var/lib/gridengine/spool/qmaster/messages.  So I assume the
outdated host_aliases confused the grid master.

Thanks,
Tim
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to