Hi William,

I've seen this before back in the SGE 6.2u5 days when it used to write out core binding options it couldn't subsequently read back in.

IIRC, users are read from disk at startup in turn and then the files are only written to from then on - so this sort of thing only tends to be noticed when the qmaster is restarted. If it finds a user file that it cannot read properly, SGE gives up reading any more user files and you'll appear to lose a big chunk of your user base even if those other user files are ok.

Your instinct is right: stop the qmaster, delete or preferably modify the file for the reported problem user so that the bit it's complaining about is removed, and start the qmaster again. Repeat if it complains about another user.

Feel free to post the main bit of the user file if you want an opinion about the edit.

If you delete the user file, you'll lose all usage for that user - including that user's contribution to projects in any share tree you might have. You'll also probably lose any jobs queued up by them.

Mark

On Mon, 16 Apr 2018, William Hay wrote:

We had a user report that one of their array jobs wasn't scheduling A
bit of poking around showed that qconf -suser knew nothing of the user
despite them having a queued job.  However there was a file in the spool
that should have defined the user.  Several other users appear to be
affected as well.

I bounced the qmaster in the hopes of getting it to reread the users'
details from disk.  And got several messages like this:

04/16/2018 11:06:53| main|util01|E|error reading file: 
"/var/opt/sge/shared/qmaster/users/zccag81"
04/16/2018 11:06:53| main|util01|E|unrecognized characters after the attribute values in 
line 12: "mem"
04/16/2018 11:06:53| main|util01|E|line 12 should begin with an attribute name

I suspect that my next step should be to stop the qmaster, delete the
problem files and then restart the qmaster.  Hopefully grid engine will
then recreate the user or I can create them manually.

However if anyone has a better idea or has seen this before I'd be glad
to hear of it.

Creation of the user object on our cluster is done by means of enforce_user 
auto:
#qconf -sconf |grep auto
enforce_user                 auto
auto_user_oticket            0
auto_user_fshare             1
auto_user_default_project    none
auto_user_delete_time        0


William


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to