Hi William,
I've seen this before back in the SGE 6.2u5 days when it used to write out
core binding options it couldn't subsequently read back in.
IIRC, users are read from disk at startup in turn and then the files are
only written to from then on - so this sort of thing only tends to be
noticed when the qmaster is restarted. If it finds a user file that it
cannot read properly, SGE gives up reading any more user files and you'll
appear to lose a big chunk of your user base even if those other user
files are ok.
Your instinct is right: stop the qmaster, delete or preferably modify the
file for the reported problem user so that the bit it's complaining about
is removed, and start the qmaster again. Repeat if it complains about
another user.
Feel free to post the main bit of the user file if you want an opinion
about the edit.
If you delete the user file, you'll lose all usage for that user -
including that user's contribution to projects in any share tree you might
have. You'll also probably lose any jobs queued up by them.
Mark
On Mon, 16 Apr 2018, William Hay wrote:
We had a user report that one of their array jobs wasn't scheduling A
bit of poking around showed that qconf -suser knew nothing of the user
despite them having a queued job. However there was a file in the spool
that should have defined the user. Several other users appear to be
affected as well.
I bounced the qmaster in the hopes of getting it to reread the users'
details from disk. And got several messages like this:
04/16/2018 11:06:53| main|util01|E|error reading file:
"/var/opt/sge/shared/qmaster/users/zccag81"
04/16/2018 11:06:53| main|util01|E|unrecognized characters after the attribute values in
line 12: "mem"
04/16/2018 11:06:53| main|util01|E|line 12 should begin with an attribute name
I suspect that my next step should be to stop the qmaster, delete the
problem files and then restart the qmaster. Hopefully grid engine will
then recreate the user or I can create them manually.
However if anyone has a better idea or has seen this before I'd be glad
to hear of it.
Creation of the user object on our cluster is done by means of enforce_user
auto:
#qconf -sconf |grep auto
enforce_user auto
auto_user_oticket 0
auto_user_fshare 1
auto_user_default_project none
auto_user_delete_time 0
William
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users