Hi, Am 25.07.2019 um 16:44 schrieb Pat Haley:
> > Hi All, > > We have been trying to install Rocks 7 on a new frontend machine, using a > restore roll from our old front-end (running Rocks 6.2) to bring over our > users, groups and various customizations (more details are available in > https://marc.info/?l=npaci-rocks-discussion&m=154514980222760&w=2 ). Our > latest issue is that the Sun Grid Engine service does not start. > > systemctl status -l sgemaster.mseas > ● sgemaster.mseas.service - LSB: start Grid Engine qmaster, shadowd > Loaded: loaded (/etc/rc.d/init.d/sgemaster.mseas; bad; vendor preset: > disabled) > Active: failed (Result: exit-code) since Fri 2019-07-19 12:26:46 EDT; > 32min ago > Docs: man:systemd-sysv-generator(8) > Process: 355124 ExecStart=/etc/rc.d/init.d/sgemaster.mseas start > (code=exited, status=1/FAILURE) > > Jul 19 12:25:44 mseas.mit.edu systemd[1]: Starting LSB: start Grid Engine > qmaster, shadowd... > Jul 19 12:25:45 mseas.mit.edu sgemaster.mseas[355124]: Starting Grid Engine > qmaster > Jul 19 12:26:46 mseas.mit.edu sgemaster.mseas[355124]: sge_qmaster start > problem > Jul 19 12:26:46 mseas.mit.edu sgemaster.mseas[355124]: sge_qmaster didn't > start! > Jul 19 12:26:46 mseas.mit.edu systemd[1]: sgemaster.mseas.service: control > process exited, code=exited status=1 > Jul 19 12:26:46 mseas.mit.edu systemd[1]: Failed to start LSB: start Grid > Engine qmaster, shadowd. > Jul 19 12:26:46 mseas.mit.edu systemd[1]: Unit sgemaster.mseas.service > entered failed state. > Jul 19 12:26:46 mseas.mit.edu systemd[1]: sgemaster.mseas.service failed. > > > in poking around, we see 2 entries for sge in /etc/passwd on the new system > > grep -in sge /etc/passwd > 44:sge:x:990:985:GridEngine System account:/opt/gridengine:/bin/true > 64:sge:x:400:400:GridEngine:/opt/gridengine:/bin/true It's definitely wrong two have two entries for one and the same account. First remove the first one which also points to an unknown group. Do you have a group with ID 985? Then: are the files in /opt/gridengine owned by this (leftover) user? But some files inside need a root-squash: $ find . -perm /u+s ./utilbin/lx24-amd64/testsuidroot ./utilbin/lx24-amd64/rlogin ./utilbin/lx24-amd64/rsh ./utilbin/lx24-amd64/authuser ./bin/lx24-amd64/sgepasswd There is the script /opt/sge/util/setfileperm.sh to correct this. > and only one on the old system > > grep -in sge /etc/passwd > 37:sge:x:400:400:GridEngine:/opt/gridengine:/bin/true > > looking at /etc/group both systems only show the old group id > > grep -in sge /etc/group > 49:sge:x:400: > > looking at the qmaster logs in > /opt/gridengine/default/spool/qmaster/messages > > we’ve found the following message: > error opening file "/opt/gridengine/default/spool/qmaster/./sharetree" for > reading: No such file or directory Did you transfer the old configuration or does this pop up in a fresh installed system? Unfortunately the procedure might be changed by the ROCKS distribution compared to the original sources. -- Reuti > However, we do not see that file on the old frontend either. > > Can anyone suggest what we can do to either correct or debug this issue? > > Pat > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: > pha...@mit.edu > > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 > http://web.mit.edu/phaley/www/ > > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users