Thanks for this. I have investigated changing the parallel environment, but it is not clear how to tell it that each MPI proc has access to n cores. At this stage if I request slots for 2 nodes it will allocate them in SGE correctly, but everything will run on the one node.
I'll thus try your suggestion and let you know if I run into any other difficulties. Thanks, Chris [cid:344dcf4a-7681-4cd8-9c6b-2b279825d64c] [cid:2111bace-e465-4801-b0ee-9e2ace60acba] Dr Chris Stevens Lecturer in Applied Mathematics Rm 602, Jack Erskine building School of Mathematics and Statistics T: +64 3 369 0396 (Internal 90396) University of Canterbury | Te Whare Wānanga o Waitaha Private Bag 4800, Christchurch 8140, New Zealand http://www.chrisdoesmaths.com<http://www.chrisdoesmaths.com/> Director SCRI Ltd http://www.scri.co.nz<http://www.scri.co.nz/> ________________________________ From: Roland Haas Sent: Saturday, October 09, 2021 11:13 To: Erik Schnetter Cc: Chris Stevens; [email protected] Subject: Re: [Users] Einstein toolkit with Sun Grid Engine Hello Chris, the "way"-ness was a TACC thing I believe. Occasionally, in particular in old files, you will see constructs such as: uniq ${PBS_NODEFILE} > ${MPD_NODEFILE} for node in $(cat ${MPD_NODEFILE}); do for ((proc=0; $proc<@(@PPN_USED@ / @NUM_THREADS@)@; proc=$proc+1)); do echo ${node} done done > ${MPI_NODEFILE} mpirun -np @NUM_PROCS@ -machinefile ${MPI_NODEFILE} @EXECUTABLE@ -L 3 @PARFILE@ ie one constructs a custom MPI host list file that manually lists the nostname as many times as needed to start the correct number of MPI ranks on the host. SGE has a similar variable PE_HOSTFILE and if all else fails you can likely do the same thing replace PBS_NODEFILE by PE_HOSTFILE Yours, Roland > Chris > > I am unfamiliar with the details of SGE; I cannot tell whether this > approach makes sense. > > -erik > > > On Thu, Oct 7, 2021 at 5:19 PM Chris Stevens <[email protected]> > wrote: > > > Hi Erik, > > > > Thanks for your suggestion. > > > > I am happy using these in the scripts, but I think the problem is how to > > pass these expressions to SGE. From what I can tell, the output of > > @(@PPN_USED@/@NUM_THREADS@)@way is, for example, "6way", given @PPN_USED@=48 > > and @NUM_THREADS@=8. This means that I have requested the parallel > > environment called 6way with @PROCS_REQUESTED@ slots. If I requested 48 > > slots, then I would use mpirun -np 6. Thus, from what I gather, for this to > > work, this specific parallel environment 6way needs to exist. I am now > > figuring out how to configure parallel environments in such a way, most > > likely by changing the allocation rule. > > > > Let me know if you think this is wrong, as it does seem rather stupid to > > not be able to just set -ncpus-per-task like in Slurm in the submission > > script. > > > > Cheers, > > > > Chris > > > > > > > > > > > > > > > > > > *Dr Chris Stevens* > > > > *Lecturer in Applied Mathematics* > > > > Rm 602, Jack Erskine building > > > > School of Mathematics and Statistics > > > > T: +64 3 369 0396 (Internal 90396) > > > > University of Canterbury | Te Whare Wānanga o Waitaha > > > > Private Bag 4800, Christchurch 8140, New Zealand > > > > https://urldefense.com/v3/__http://www.chrisdoesmaths.com__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_vsMaonm$ > > > > *Director* > > SCRI Ltd > > https://urldefense.com/v3/__http://www.scri.co.nz__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_kjY9Edc$ > > > > ------------------------------ > > *From:* Erik Schnetter <[email protected]> > > *Sent:* 08 October 2021 09:40 > > *To:* Chris Stevens <[email protected]> > > *Cc:* [email protected] <[email protected]> > > *Subject:* Re: [Users] Einstein toolkit with Sun Grid Engine > > > > Chris > > > > It might not be necessary to hard-code the number of threads. You can use > > arbitrary Python expressions via "@( ... )@" in the templates. See e.g. the > > template for Blue Waters which uses this to choose between CPU and GPU > > queues. > > > > -erik > > > > > > On Thu, Oct 7, 2021 at 4:04 PM Chris Stevens < > > [email protected]> wrote: > > > > Hi Roland, > > > > That's fantastic, thanks for linking those files. > > > > It works as expected with only MPI processes. I am careful in compiling > > and running with the same (and only) OpenMPI installation on the cluster, > > so this should be OK. > > > > Finding a Slurm to SGE conversion table, there is no SGE equivalent to > > ncpus-per-task from Slurm, rather it is the allocation type of the given > > parallel environment that does this. I.e. the backend. > > > > https://urldefense.com/v3/__https://srcc.stanford.edu/sge-slurm-conversion__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_iEcWKp1$ > > > > Further, in the submit script of ranger, the crucial line > > > > #$ -pe @(@PPN_USED@/@NUM_THREADS@)@way @PROCS_REQUESTED@ > > > > shows that you request @PROCS_REQUESTED@ slots (as I currently have) and > > the first argument shows that the name of the parallel environment is > > dependent upon @NUM_THREADS@. From what I take from this, I need to set > > up a parallel environment that has hardcoded the number of threads I want > > per MPI process and then use that parallel environment. I'll see how I go > > there, but it isn't initially obvious how to do this! > > > > Cheers, > > > > Chris > > > > > > > > > > > > > > > > *Dr Chris Stevens* > > > > *Lecturer in Applied Mathematics* > > > > Rm 602, Jack Erskine building > > > > School of Mathematics and Statistics > > > > T: +64 3 369 0396 (Internal 90396) > > > > University of Canterbury | Te Whare Wānanga o Waitaha > > > > Private Bag 4800, Christchurch 8140, New Zealand > > > > https://urldefense.com/v3/__http://www.chrisdoesmaths.com__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_vsMaonm$ > > > > *Director* > > SCRI Ltd > > https://urldefense.com/v3/__http://www.scri.co.nz__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_kjY9Edc$ > > > > > > ------------------------------ > > *From:* Roland Haas > > *Sent:* Thursday, October 07, 2021 06:22 > > *To:* Chris Stevens > > *Cc:* [email protected] > > *Subject:* Re: [Users] Einstein toolkit with Sun Grid Engine > > > > Hello Chris, > > > > We used SGE a long time ago on some of the TACC machines. > > > > You can find an old setup for TACC's Ranger cluster in an old commit > > like so: > > > > git checkout fed9f8d6fae4c52ed2d0a688fcc99e51b94e608e > > > > and then look at the "ranger" files in OUTDATED subdirectories of > > machines, runscripts, submitscripts. > > > > Having all MPI ranks on a single node might also be caused by using > > different MPI stacks when compiling and when running so you must make > > sure that the "mpirun" (or equivalent command) you use is the one that > > belongs to the MPI library that you used when linking your code. > > > > Finally you may also have to check if this is an issue with threads and > > MPI ranks. Ie I would check if things are still wrong if you use only > > MPI processes and no OpenMP threads at all (in that case you would have > > to check what SGE counts: threads (cores) or MPI ranks (processes)). > > > > Yours, > > Roland > > > > > Hi everyone, > > > > > > I have set up the Einstein toolkit on a local cluster of 20 nodes with > > the SGE scheduler. I have not seen any examples of this scheduler being > > used with the Einstein toolkit. > > > > > > I have managed to get it working; however it seems if I ask for a > > certain number of slots that requires more than one node, it correctly > > allocates these, however all processes and threads are run on the one node > > and is oversubscribed. > > > > > > My question is whether anybody has used SGE with the Einstein toolkit > > and if this is a good thing or not? If it is possible, I can send more > > details if there are people willing to help solve this inter-node > > communication problem. > > > > > > Thanks in advance, > > > > > > Chris > > > > > > [cid:29d54967-59c8-486e-adea-80af7ce2cc49] > > > > > > > > > [cid:55ebbbb5-1e12-45a2-8d51-206c70460c36] > > > > > > > > > > > > Dr Chris Stevens > > > > > > Lecturer in Applied Mathematics > > > > > > Rm 602, Jack Erskine building > > > > > > School of Mathematics and Statistics > > > > > > T: +64 3 369 0396 (Internal 90396) > > > > > > University of Canterbury | Te Whare Wānanga o Waitaha > > > > > > Private Bag 4800, Christchurch 8140, New Zealand > > > > > > > > https://urldefense.com/v3/__http://www.chrisdoesmaths.com__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsZduIoBu$ > > < > > https://urldefense.com/v3/__http://www.chrisdoesmaths.com/__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsfTVv_dN$ > > > > > > > > > > > > Director > > > SCRI Ltd > > > > > https://urldefense.com/v3/__http://www.scri.co.nz__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsaY3VCkl$ > > < > > https://urldefense.com/v3/__http://www.scri.co.nz/__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsSEV4xVt$ > > > > > > > > > > > > > > -- > > My email is as private as my paper mail. I therefore support encrypting > > and signing email messages. Get my PGP key from > > https://urldefense.com/v3/__http://pgp.mit.edu__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_myvSPeF$ > > . > > _______________________________________________ > > Users mailing list > > [email protected] > > https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_p6wf57D$ > > > > > > > > -- > > Erik Schnetter <[email protected]> > > https://urldefense.com/v3/__http://www.perimeterinstitute.ca/personal/eschnetter/__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_soNnivC$ > > > > _______________________________________________ > > Users mailing list > > [email protected] > > https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_p6wf57D$ > > > > Yours, Roland -- My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://pgp.mit.edu .
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
