Hi All. I've got myself into a mess, and I'd appreciate any pointers you could give me to get out...
I've got a grid based on 6.2u2 (I believe). Pretty much everything lives in /ope/sge/ . That filesystem is NFS-exported from one machine, and mounted on all the execution nodes. The binaries live in /opt/sge/bin/lx24-amd64/ . That works nicely enough. I'm trying to upgrade to 6.2u5 using Dave Love's RPMs. I've moved the NFS mount and installed the RPMs into /opt/sge on a test machine. I bind-mounted the /opt/sge/default and /ope/sge/Build directories back into that (there appear to be spool directories down there). Note that the binaries now live in /opt/sge/bin/lx26-amd64/ . The environment is pretty much the same as before, with just the path updated to cope with the change of arch. sge_execd starts. But when I try to run any commands, they hang. strace shows me the comms process starting with my queue master, but then it sits in a loop polling the master for data (which never comes). I can't even qconf on that machine. Can anyone tell me where to start looking for logs? I've found precious little (a one-liner in /tmp that didn't seem to help). Thanks! Vic. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
