Re: [gridengine users] Reporting SGE via QACCT output

2014-06-13 Thread Reuti
Am 13.06.2014 um 04:30 schrieb Sangmin Park:

 Yes, both files, 'mpiexec' and 'mpiexec.hydra', are in the bin directory 
 inside Intel MPI.
 But, 'mpiexec' file is linked to 'mpiexec.py' file.
 Does it okay if I create a symbolic link 'mpiexec' pointing to 
 'mpiexec.hydra' instead of 'mpiexec.py' ?

Sure, this should work too.

-- Reuti


 Since there are several running jobs on the cluster, I need to check it again 
 and again.
 Thanks.
 
 --Sangmin
 
 
 On Thu, Jun 12, 2014 at 8:04 PM, Reuti re...@staff.uni-marburg.de wrote:
 Am 12.06.2014 um 04:23 schrieb Sangmin Park:
 
  I've checked the version of Intel MPI. He uses Intel MPI 4.0.3.008 version.
  Our system uses rsh to access computing nodes. SGE doses, too.
 
  Please let me know how to cehck which one is used 'mpiexec.hydry' or 
  'mpiexec'.
 
 Do you have both files somewhere in a bin directory inside the Intel MPI? 
 You could rename mpiexec and create a symbolic link mpiexec pointing to 
 mpiexec.hydra. The old startup will need some daemons running on the node 
 (which are outside of SGE's control and accounting*), but mpiexec.hydra 
 will startup the child processes on its own as kids of its own and should 
 hence be under SGE's control. And as long as you are staying on one and the 
 same node, this should work already without further setup then. To avoid a 
 later surprise when you compute between nodes, the `rsh`/`ssh` should 
 nevertheless being caught and redirected to `qrsh -inherit...` like outlined 
 in $SGE_ROOT/mpi.
 
 -- Reuti
 
 *) It's even possible to force the daemons to be started under SGE, but it's 
 convoluted and not recommended.
 
 
  Sangmin
 
 
  On Wed, Jun 11, 2014 at 6:46 PM, Reuti re...@staff.uni-marburg.de wrote:
  Hi,
 
  Am 11.06.2014 um 02:38 schrieb Sangmin Park:
 
   For the best performance, we recommend users to use 8 cores on a single 
   particular node, not distributed with multi node.
   Before I said, he uses VASP application compiled with Intel MPI. So he 
   uses Intel MPI now.
 
  Which version of Intel MPI? Even with the latest one it's not tightly 
  integrated by default (despite the fact, that MPICH3 [on which it is based] 
  is tightly integrated by default).
 
  Depending on the version it might be necessary to make some adjustments - 
  IIRC mainly use `mpiexec.hydra` instead of `mpiexec` and supply a wrapper 
  to catch the `rsh`/`ssh` call (like in the MPI demo in SGE's directory).
 
  -- Reuti
 
 
   --Sangmin
  
  
   On Tue, Jun 10, 2014 at 5:58 PM, Reuti re...@staff.uni-marburg.de wrote:
   Hi,
  
   Am 10.06.2014 um 10:21 schrieb Sangmin Park:
  
This user does always parallel job using VASP application.
Usually, he uses 8 cores per a job. Lots of this kind of job have been 
submitted by the user.
  
   8 cores on a particular node or 8 slots across the cluster? What MPI 
   implementation does he use?
  
   -- Reuti
  
   NB: Please keep the list posted.
  
  
Sangmin
   
   
On Tue, Jun 10, 2014 at 3:42 PM, Reuti re...@staff.uni-marburg.de 
wrote:
Am 10.06.2014 um 08:00 schrieb Sangmin Park:
   
 Hello,

 I'm very confused about the output of qacct command.
 I thought CPU column time is the best way to measure resource usage 
 by users through this web page, 
 https://wiki.duke.edu/display/SCSC/Checking+SGE+Usage

 But, I have some situation.
 One of users in my institution, actually this user is a one of heavy 
 users, uses lots of HPC resources. To get the resource usage by this 
 user for requirement of the payment, I commanded qacct and the output 
 is below, this is just for May.

 OWNER   WALLCLOCK UTIME STIME   CPU   
   MEMORY IOIOW
 
 p012chm   298081028.48535.012   100.634   
4.277  0.576  0.000

 CPU time is too much small. Because he is very heavy user of our 
 institution, I can not accept this result. However, the WALLCLOCK 
 time is very much.

 How do I get correct information of usage resources by users via 
 qacct?
   
This may happen in case you have parallel jobs which are not tightly 
integrated into SGE. What types of jobs is the user running?
   
-- Reuti
   
   
 ===
 Sangmin Park
 Supercomputing Center
 Ulsan National Institute of Science and Technology(UNIST)
 Ulsan, 689-798, Korea

 phone : +82-52-217-4201
 mobile : +82-10-5094-0405
 fax : +82-52-217-4209
 ===
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users
   
   
   
   
--
===
Sangmin Park

Re: [gridengine users] Reporting SGE via QACCT output

2014-06-13 Thread Reuti
Am 13.06.2014 um 06:50 schrieb Sangmin Park:

 Hi, 
 
 I've checked his job when it's running.
 I've checked it via 'ps -ef' command and found that his job is using 
 mpiexec.hydra.

Putting a blank between -e and f will give a nice process tree.


 And 'qrsh' is using '-inherit' option. Here's details.
 
 p012chm  21424 21398  0 13:20 ?00:00:00 bash 
 /opt/sge/default/spool/lion07/job_scripts/46651
 p012chm  21431 21424  0 13:20 ?00:00:00 /bin/bash 
 /opt/intel/impi/4.0.3.008/intel64/bin/mpirun -np 12 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21442 21431  0 13:20 ?00:00:00 mpiexec.hydra -machinefile 
 /tmp/sge_machinefile_21431 -np 12 

What creates this sge_machinefile_21431? Often it's put into $TMPDIR, i.e. 
the temporary directory of the job as you can use always the same name and it 
will be removed after the job for sure.


 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21443 21442  0 13:20 ?00:00:00 /opt/sge/bin/lx24-amd64/qrsh 
 -inherit lion07 

Ok, on the one hand this looks good and should give a proper accounting. But 
maybe there is something about the hostname resolution, as AFAIK on the local 
machine lion07 it should just fork instead making a local `qrsh -inherit...`.

Does `qstat -f` list the short names only, or are the FQDN in the output for 
the queue instances?

-- Reuti


 /home/p012pnj/intel/impi/intel64/bin/pmi_proxy --control-port lion07:54060 
 --pmi-connect lazy-cache --pmi-aggregate --bootstrap rsh --bootstrap-exec rsh 
 --demux poll --pgid 0 --enable-stdin 1 --proxy-id 0
 root 21452 21451  0 13:20 ?00:00:00 sshd: p012chm [priv]
 p012chm  21453 21443  0 13:20 ?00:00:00 /usr/bin/ssh -p 60725 lion07 
 exec '/opt/sge/utilbin/lx24-amd64/qrsh_starter' 
 '/opt/sge/default/spool/lion07/active_jobs/46651.1/1.lion07'
 p012chm  21457 21452  0 13:20 ?00:00:00 sshd: p012chm@notty
 p012chm  21458 21457  0 13:20 ?00:00:00 
 /opt/sge/utilbin/lx24-amd64/qrsh_starter 
 /opt/sge/default/spool/lion07/active_jobs/46651.1/1.lion07
 p012chm  21548 21458  0 13:20 ?00:00:00 
 /home/p012pnj/intel/impi/intel64/bin/pmi_proxy --control-port lion07:54060 
 --pmi-connect lazy-cache --pmi-aggregate --bootstrap rsh --bootstrap-exec rsh 
 --demux poll --pgid 0 --enable-stdin 1 --proxy-id 0
 p012chm  21549 21548 99 13:20 ?00:22:04 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21550 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21551 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21552 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21553 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21554 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21555 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21556 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21557 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21558 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21559 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 p012chm  21560 21548 99 13:20 ?00:22:10 
 /home/p012chm/Binary4intelMPI/vasp.5.2.12_GRAPE.O3.MPIBLOCK5000.mpi.x
 smpark   21728 21638  0 13:43 pts/000:00:00 grep chm
 
 --Sangmin
 
 
 On Thu, Jun 12, 2014 at 8:04 PM, Reuti re...@staff.uni-marburg.de wrote:
 Am 12.06.2014 um 04:23 schrieb Sangmin Park:
 
  I've checked the version of Intel MPI. He uses Intel MPI 4.0.3.008 version.
  Our system uses rsh to access computing nodes. SGE doses, too.
 
  Please let me know how to cehck which one is used 'mpiexec.hydry' or 
  'mpiexec'.
 
 Do you have both files somewhere in a bin directory inside the Intel MPI? 
 You could rename mpiexec and create a symbolic link mpiexec pointing to 
 mpiexec.hydra. The old startup will need some daemons running on the node 
 (which are outside of SGE's control and accounting*), but mpiexec.hydra 
 will startup the child processes on its own as kids of its own and should 
 hence be under SGE's control. And as long as you are staying on one and the 
 same node, this should work already without further setup then. To avoid a 
 later surprise when you compute between nodes, the `rsh`/`ssh` should 
 nevertheless being caught and redirected to `qrsh -inherit...` like outlined 
 in $SGE_ROOT/mpi.
 
 -- Reuti
 
 *) It's even possible to 

[gridengine users] does SGE do smart core assignment for jobs that are multi-threaded and parallel?

2014-06-13 Thread bergman

We're running SoGE 8.1.6, and I wanted to understand how SoGE manages
CPU resources for jobs that are both multi-threaded and MPI-parallel.

We have slots configured as a consumable resource, with the number of
slots per-node equal to the number of CPU-cores.

We use OpenMPI with tight SGE integration.

We use a core binding strategy of linear_automatic, set in a JSV,
to allocate the requested number of cores for each job.

We will have a job that has an initial MPI phase and later in the
same job a multi-threaded phase. Each parallel process in MPI phase is
single-threaded.

If the job requests 10 slots of each type (and we have individual nodes
with more than 10 cores), submitted like:

qsub -pe threaded 10 -pe openmpi 10 myjob

is SoGE 'smart' enough to do the following:

[when resources are available] launch the job on a compute
node, 'consuming' 10 slots from the available count on 
that node

execute the 10 OpenMPI threads on the same compute node, using
the cores allocated by the core-binding

when the MPI portion of the job is complete, be aware that
the cores used by the MPI code are available and run the
multithreaded portion on the same cores

Is any specific configuration required to get that behavior?

Thanks,

Mark

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users