Re: [gridengine users] Reporting SGE via QACCT output
Am 13.06.2014 um 04:30 schrieb Sangmin Park: Yes, both files, 'mpiexec' and 'mpiexec.hydra', are in the bin directory inside Intel MPI. But, 'mpiexec' file is linked to 'mpiexec.py' file. Does it okay if I create a symbolic link 'mpiexec' pointing to 'mpiexec.hydra' instead of 'mpiexec.py' ? Sure, this should work too. -- Reuti Since there are several running jobs on the cluster, I need to check it again and again. Thanks. --Sangmin On Thu, Jun 12, 2014 at 8:04 PM, Reuti re...@staff.uni-marburg.de wrote: Am 12.06.2014 um 04:23 schrieb Sangmin Park: I've checked the version of Intel MPI. He uses Intel MPI 4.0.3.008 version. Our system uses rsh to access computing nodes. SGE doses, too. Please let me know how to cehck which one is used 'mpiexec.hydry' or 'mpiexec'. Do you have both files somewhere in a bin directory inside the Intel MPI? You could rename mpiexec and create a symbolic link mpiexec pointing to mpiexec.hydra. The old startup will need some daemons running on the node (which are outside of SGE's control and accounting*), but mpiexec.hydra will startup the child processes on its own as kids of its own and should hence be under SGE's control. And as long as you are staying on one and the same node, this should work already without further setup then. To avoid a later surprise when you compute between nodes, the `rsh`/`ssh` should nevertheless being caught and redirected to `qrsh -inherit...` like outlined in $SGE_ROOT/mpi. -- Reuti *) It's even possible to force the daemons to be started under SGE, but it's convoluted and not recommended. Sangmin On Wed, Jun 11, 2014 at 6:46 PM, Reuti re...@staff.uni-marburg.de wrote: Hi, Am 11.06.2014 um 02:38 schrieb Sangmin Park: For the best performance, we recommend users to use 8 cores on a single particular node, not distributed with multi node. Before I said, he uses VASP application compiled with Intel MPI. So he uses Intel MPI now. Which version of Intel MPI? Even with the latest one it's not tightly integrated by default (despite the fact, that MPICH3 [on which it is based] is tightly integrated by default). Depending on the version it might be necessary to make some adjustments - IIRC mainly use `mpiexec.hydra` instead of `mpiexec` and supply a wrapper to catch the `rsh`/`ssh` call (like in the MPI demo in SGE's directory). -- Reuti --Sangmin On Tue, Jun 10, 2014 at 5:58 PM, Reuti re...@staff.uni-marburg.de wrote: Hi, Am 10.06.2014 um 10:21 schrieb Sangmin Park: This user does always parallel job using VASP application. Usually, he uses 8 cores per a job. Lots of this kind of job have been submitted by the user. 8 cores on a particular node or 8 slots across the cluster? What MPI implementation does he use? -- Reuti NB: Please keep the list posted. Sangmin On Tue, Jun 10, 2014 at 3:42 PM, Reuti re...@staff.uni-marburg.de wrote: Am 10.06.2014 um 08:00 schrieb Sangmin Park: Hello, I'm very confused about the output of qacct command. I thought CPU column time is the best way to measure resource usage by users through this web page, https://wiki.duke.edu/display/SCSC/Checking+SGE+Usage But, I have some situation. One of users in my institution, actually this user is a one of heavy users, uses lots of HPC resources. To get the resource usage by this user for requirement of the payment, I commanded qacct and the output is below, this is just for May. OWNER WALLCLOCK UTIME STIME CPU MEMORY IOIOW p012chm 298081028.48535.012 100.634 4.277 0.576 0.000 CPU time is too much small. Because he is very heavy user of our institution, I can not accept this result. However, the WALLCLOCK time is very much. How do I get correct information of usage resources by users via qacct? This may happen in case you have parallel jobs which are not tightly integrated into SGE. What types of jobs is the user running? -- Reuti === Sangmin Park Supercomputing Center Ulsan National Institute of Science and Technology(UNIST) Ulsan, 689-798, Korea phone : +82-52-217-4201 mobile : +82-10-5094-0405 fax : +82-52-217-4209 === ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- === Sangmin Park
Re: [gridengine users] Reporting SGE via QACCT output
Am 13.06.2014 um 06:50 schrieb Sangmin Park: Hi, I've checked his job when it's running. I've checked it via 'ps -ef' command and found that his job is using mpiexec.hydra. Putting a blank between -e and f will give a nice process tree. And 'qrsh' is using '-inherit' option. Here's details. p012chm 21424 21398 0 13:20 ?00:00:00 bash /opt/sge/default/spool/lion07/job_scripts/46651 p012chm 21431 21424 0 13:20 ?00:00:00 /bin/bash /opt/intel/impi/4.0.3.008/intel64/bin/mpirun -np 12 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21442 21431 0 13:20 ?00:00:00 mpiexec.hydra -machinefile /tmp/sge_machinefile_21431 -np 12 What creates this sge_machinefile_21431? Often it's put into $TMPDIR, i.e. the temporary directory of the job as you can use always the same name and it will be removed after the job for sure. /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21443 21442 0 13:20 ?00:00:00 /opt/sge/bin/lx24-amd64/qrsh -inherit lion07 Ok, on the one hand this looks good and should give a proper accounting. But maybe there is something about the hostname resolution, as AFAIK on the local machine lion07 it should just fork instead making a local `qrsh -inherit...`. Does `qstat -f` list the short names only, or are the FQDN in the output for the queue instances? -- Reuti /home/p012pnj/intel/impi/intel64/bin/pmi_proxy --control-port lion07:54060 --pmi-connect lazy-cache --pmi-aggregate --bootstrap rsh --bootstrap-exec rsh --demux poll --pgid 0 --enable-stdin 1 --proxy-id 0 root 21452 21451 0 13:20 ?00:00:00 sshd: p012chm [priv] p012chm 21453 21443 0 13:20 ?00:00:00 /usr/bin/ssh -p 60725 lion07 exec '/opt/sge/utilbin/lx24-amd64/qrsh_starter' '/opt/sge/default/spool/lion07/active_jobs/46651.1/1.lion07' p012chm 21457 21452 0 13:20 ?00:00:00 sshd: p012chm@notty p012chm 21458 21457 0 13:20 ?00:00:00 /opt/sge/utilbin/lx24-amd64/qrsh_starter /opt/sge/default/spool/lion07/active_jobs/46651.1/1.lion07 p012chm 21548 21458 0 13:20 ?00:00:00 /home/p012pnj/intel/impi/intel64/bin/pmi_proxy --control-port lion07:54060 --pmi-connect lazy-cache --pmi-aggregate --bootstrap rsh --bootstrap-exec rsh --demux poll --pgid 0 --enable-stdin 1 --proxy-id 0 p012chm 21549 21548 99 13:20 ?00:22:04 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21550 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21551 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21552 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21553 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21554 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21555 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21556 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21557 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21558 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21559 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x p012chm 21560 21548 99 13:20 ?00:22:10 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x smpark 21728 21638 0 13:43 pts/000:00:00 grep chm --Sangmin On Thu, Jun 12, 2014 at 8:04 PM, Reuti re...@staff.uni-marburg.de wrote: Am 12.06.2014 um 04:23 schrieb Sangmin Park: I've checked the version of Intel MPI. He uses Intel MPI 4.0.3.008 version. Our system uses rsh to access computing nodes. SGE doses, too. Please let me know how to cehck which one is used 'mpiexec.hydry' or 'mpiexec'. Do you have both files somewhere in a bin directory inside the Intel MPI? You could rename mpiexec and create a symbolic link mpiexec pointing to mpiexec.hydra. The old startup will need some daemons running on the node (which are outside of SGE's control and accounting*), but mpiexec.hydra will startup the child processes on its own as kids of its own and should hence be under SGE's control. And as long as you are staying on one and the same node, this should work already without further setup then. To avoid a later surprise when you compute between nodes, the `rsh`/`ssh` should nevertheless being caught and redirected to `qrsh -inherit...` like outlined in $SGE_ROOT/mpi. -- Reuti *) It's even possible to
[gridengine users] does SGE do smart core assignment for jobs that are multi-threaded and parallel?
We're running SoGE 8.1.6, and I wanted to understand how SoGE manages CPU resources for jobs that are both multi-threaded and MPI-parallel. We have slots configured as a consumable resource, with the number of slots per-node equal to the number of CPU-cores. We use OpenMPI with tight SGE integration. We use a core binding strategy of linear_automatic, set in a JSV, to allocate the requested number of cores for each job. We will have a job that has an initial MPI phase and later in the same job a multi-threaded phase. Each parallel process in MPI phase is single-threaded. If the job requests 10 slots of each type (and we have individual nodes with more than 10 cores), submitted like: qsub -pe threaded 10 -pe openmpi 10 myjob is SoGE 'smart' enough to do the following: [when resources are available] launch the job on a compute node, 'consuming' 10 slots from the available count on that node execute the 10 OpenMPI threads on the same compute node, using the cores allocated by the core-binding when the MPI portion of the job is complete, be aware that the cores used by the MPI code are available and run the multithreaded portion on the same cores Is any specific configuration required to get that behavior? Thanks, Mark ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users