Re: [gridengine users] building Son of Grid Engine

Reuti Wed, 22 May 2013 07:39:37 -0700

Am 16.05.2013 um 11:10 schrieb Tina Friedrich:

> Hi Reuti,
> 
> >> have finally decided to look into upgrading our SGE6.2 installation >> - 
> >> mainly to see if it helps with my job scheduling problem.
>>> 
>>> I'm trying to build Son of Grid Engine - succeeded actually.
>>> Currently trying to make it run / import my old configuration.
>>> Which mostly worked. Couple of niggles.
>>> 
>>> Our setup is SGE_ROOT on shared NFS file system, SGE running as a
>>> non-root user. I'd quite like to keep it that way (it worked well
>>> for us).
>> 
>> The real and effective user is not root? I wonder how to change to a
>> different user during execution then. Often this can be seen:
>> 
> > $ ps -e -o user,ruser,group,rgroup,command
>> USER     RUSER    GROUP    RGROUP   COMMAND
>> ...
>> sgeadmin root     gridware root     /usr/sge/bin/lx24-x86/sge_execd
> 
> The real and effective user is not root, and never was. Never caused us any 
> problems. The NFS share is exported with root_squash.


This is quite interesting. And all jobs are running under their inquired user 
account or do you use one common user account for all jobs?

-- Reuti


>>> Managed to build & install, got the qmaster running, managed to
>>> start
> execds. However, at least inst_sge.sh -upd-execd simply refuses to work
> if you're not root, if I remember correctly (not helping!).
>>> 
>>> Script(s) sometimes say 'You are not installing as user >root< -
> Can't set the file owner/group and permissions'. It would help if they'd
> tell me (without digging through them) what files they're trying to
> chown/chmod and what they're trying to chown/chmod it to - so I can fix
> that, if there is a problem. Goes for a lot of these sort of errors (to
> do with running as non-root) - if it fails to do something, it would
> really help to know what it failed to do.
>>> 
>>> The other thing is that I keep having to run it with -nobincheck,
>>> as
> far as I can tell simply because I didn't build qmon. Annoying - should
> it not just check for actually required binaries?
>>> 
>>> Importing my old installation / upgrading from my old installation
> didn't quite work. Mostly did, it seems, which is something. No error
> that I'd seen during the import/upgrade, but none of my queues are
> there. Host groups are; exec hosts are; complexes look okay; global
> config looks right. PEs aren't there; trying to create the PEs from the
> config files I originally created them from I get 'error: required
> attribute "qsort_args" is missing'. Assume that's the root problem (i.e.
> did not manage to import PEs, thus can't import queues). Anyone else had
> issues with that? Should the save_config script have caught that?
>> 
>> The "qsort_args" is new therein. You dumped the old configuration
> using $SGE_ROOT/util/upgrade_modules/save_sge_config.sh? Then it should
> work to add just this line to the generated textfile for the PEs in the
> created directory with the text files.
> 
> I indeed dumped the config using said script. Was just wondering if the 
> script were supposed to add a default qsort_args line, or at least the import 
> script warn you that it's missing and will thus not work? (Or the export 
> script telling you?)
> 
> 
>>> And now for the important question :). My execds currently are a
>>> mix
> of RHEL5 and RHEL6; SoGE got compiled on RHEL6, doesn't work on RHEL5
> execds.
>> 
>> Do you use the old original execds or the newly compiled one?
>> 
>> If you use the new ones: maybe compiling all on RHEL5 and execute
> these on RHEL6 might have better chances to work.
> 
> I shall try that; I was just wondering if anyone already knows of a way to 
> make them work on both.
> 
>>> Also, all nodes and the master/shadow hosts get software upgrades
> quite regularly
>> 
>> I would fear that with updates to the nodes all the software you use
> also need to be revalidated, i.e. running the test-suite for all.
> Otherwise a change to e.g. a mathematical library may lead to different
> results after an update.
> 
> The cluster node configuration is very similar to our standard workstation(s) 
> - and there is a lot of software people are using on both. A lot of it 
> compiled (and/or written) in house, and in a central location. So the risk of 
> said libraries being out of sync (as it were) with the standard workstation 
> setup (and hence, things that work on workstations not working on the cluster 
> or vice versa) is - to us - much more of a concern. So, cluster nodes get 
> upgraded along with the rest of the estate.
> 
> Tina
> 
> -- 
> Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
> Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
> 
> -- 
> This e-mail and any attachments may contain confidential, copyright and or 
> privileged material, and are for the use of the intended addressee only. If 
> you are not the intended addressee or an authorised recipient of the 
> addressee please notify us of receipt by returning the e-mail and do not use, 
> copy, retain, distribute or disclose the information in or attached to the 
> e-mail.
> Any opinions expressed within this e-mail are those of the individual and not 
> necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot 
> guarantee that this e-mail or any attachments are free from viruses and we 
> cannot accept liability for any damage which you may sustain as a result of 
> software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and 
> Wales with its registered office at Diamond House, Harwell Science and 
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> 
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] building Son of Grid Engine

Reply via email to