Hi All.

I've got myself into a mess, and I'd appreciate any pointers you could
give me to get out...

I've got a grid based on 6.2u2 (I believe). Pretty much everything lives
in /ope/sge/ . That filesystem is NFS-exported from one machine, and
mounted on all the execution nodes. The binaries live in
/opt/sge/bin/lx24-amd64/ . That works nicely enough.

I'm trying to upgrade to 6.2u5 using Dave Love's RPMs. I've moved the NFS
mount and installed the RPMs into /opt/sge on a test machine. I
bind-mounted the /opt/sge/default and /ope/sge/Build directories back into
that (there appear to be spool directories down there).

Note that the binaries now live in /opt/sge/bin/lx26-amd64/ . The
environment is pretty much the same as before, with just the path updated
to cope with the change of arch.

sge_execd starts. But when I try to run any commands, they hang. strace
shows me the comms process starting with my queue master, but then it sits
in a loop polling the master for data (which never comes). I can't even
qconf on that machine.

Can anyone tell me where to start looking for logs? I've found precious
little (a one-liner in /tmp that didn't seem to help).

Thanks!

Vic.



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to