Am 16.05.2013 um 11:10 schrieb Tina Friedrich: > Hi Reuti, > > >> have finally decided to look into upgrading our SGE6.2 installation >> - > >> mainly to see if it helps with my job scheduling problem. >>> >>> I'm trying to build Son of Grid Engine - succeeded actually. >>> Currently trying to make it run / import my old configuration. >>> Which mostly worked. Couple of niggles. >>> >>> Our setup is SGE_ROOT on shared NFS file system, SGE running as a >>> non-root user. I'd quite like to keep it that way (it worked well >>> for us). >> >> The real and effective user is not root? I wonder how to change to a >> different user during execution then. Often this can be seen: >> > > $ ps -e -o user,ruser,group,rgroup,command >> USER RUSER GROUP RGROUP COMMAND >> ... >> sgeadmin root gridware root /usr/sge/bin/lx24-x86/sge_execd > > The real and effective user is not root, and never was. Never caused us any > problems. The NFS share is exported with root_squash.
This is quite interesting. And all jobs are running under their inquired user account or do you use one common user account for all jobs? -- Reuti >>> Managed to build & install, got the qmaster running, managed to >>> start > execds. However, at least inst_sge.sh -upd-execd simply refuses to work > if you're not root, if I remember correctly (not helping!). >>> >>> Script(s) sometimes say 'You are not installing as user >root< - > Can't set the file owner/group and permissions'. It would help if they'd > tell me (without digging through them) what files they're trying to > chown/chmod and what they're trying to chown/chmod it to - so I can fix > that, if there is a problem. Goes for a lot of these sort of errors (to > do with running as non-root) - if it fails to do something, it would > really help to know what it failed to do. >>> >>> The other thing is that I keep having to run it with -nobincheck, >>> as > far as I can tell simply because I didn't build qmon. Annoying - should > it not just check for actually required binaries? >>> >>> Importing my old installation / upgrading from my old installation > didn't quite work. Mostly did, it seems, which is something. No error > that I'd seen during the import/upgrade, but none of my queues are > there. Host groups are; exec hosts are; complexes look okay; global > config looks right. PEs aren't there; trying to create the PEs from the > config files I originally created them from I get 'error: required > attribute "qsort_args" is missing'. Assume that's the root problem (i.e. > did not manage to import PEs, thus can't import queues). Anyone else had > issues with that? Should the save_config script have caught that? >> >> The "qsort_args" is new therein. You dumped the old configuration > using $SGE_ROOT/util/upgrade_modules/save_sge_config.sh? Then it should > work to add just this line to the generated textfile for the PEs in the > created directory with the text files. > > I indeed dumped the config using said script. Was just wondering if the > script were supposed to add a default qsort_args line, or at least the import > script warn you that it's missing and will thus not work? (Or the export > script telling you?) > > >>> And now for the important question :). My execds currently are a >>> mix > of RHEL5 and RHEL6; SoGE got compiled on RHEL6, doesn't work on RHEL5 > execds. >> >> Do you use the old original execds or the newly compiled one? >> >> If you use the new ones: maybe compiling all on RHEL5 and execute > these on RHEL6 might have better chances to work. > > I shall try that; I was just wondering if anyone already knows of a way to > make them work on both. > >>> Also, all nodes and the master/shadow hosts get software upgrades > quite regularly >> >> I would fear that with updates to the nodes all the software you use > also need to be revalidated, i.e. running the test-suite for all. > Otherwise a change to e.g. a mathematical library may lead to different > results after an update. > > The cluster node configuration is very similar to our standard workstation(s) > - and there is a lot of software people are using on both. A lot of it > compiled (and/or written) in house, and in a central location. So the risk of > said libraries being out of sync (as it were) with the standard workstation > setup (and hence, things that work on workstations not working on the cluster > or vice versa) is - to us - much more of a concern. So, cluster nodes get > upgraded along with the rest of the estate. > > Tina > > -- > Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd > Diamond House, Harwell Science and Innovation Campus - 01235 77 8442 > > -- > This e-mail and any attachments may contain confidential, copyright and or > privileged material, and are for the use of the intended addressee only. If > you are not the intended addressee or an authorised recipient of the > addressee please notify us of receipt by returning the e-mail and do not use, > copy, retain, distribute or disclose the information in or attached to the > e-mail. > Any opinions expressed within this e-mail are those of the individual and not > necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot > guarantee that this e-mail or any attachments are free from viruses and we > cannot accept liability for any damage which you may sustain as a result of > software viruses which may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England and > Wales with its registered office at Diamond House, Harwell Science and > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > > > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users