Re: [gridengine users] Reporting SGE via QACCT output

Reuti Thu, 12 Jun 2014 04:07:29 -0700

Am 12.06.2014 um 04:23 schrieb Sangmin Park:

> I've checked the version of Intel MPI. He uses Intel MPI 4.0.3.008 version.
> Our system uses rsh to access computing nodes. SGE doses, too.
> 
> Please let me know how to cehck which one is used 'mpiexec.hydry' or 
> 'mpiexec'.


Do you have both files somewhere in a "bin" directory inside the Intel MPI? You 
could rename "mpiexec" and create a symbolic link "mpiexec" pointing to 
"mpiexec.hydra". The old startup will need some daemons running on the node 
(which are outside of SGE's control and accounting*), but "mpiexec.hydra" will 
startup the child processes on its own as kids of its own and should hence be 
under SGE's control. And as long as you are staying on one and the same node, 
this should work already without further setup then. To avoid a later surprise 
when you compute between nodes, the `rsh`/`ssh` should nevertheless being 
caught and redirected to `qrsh -inherit...` like outlined in "$SGE_ROOT/mpi".

-- Reuti

*) It's even possible to force the daemons to be started under SGE, but it's 
convoluted and not recommended.


> Sangmin 
> 
> 
> On Wed, Jun 11, 2014 at 6:46 PM, Reuti <[email protected]> wrote:
> Hi,
> 
> Am 11.06.2014 um 02:38 schrieb Sangmin Park:
> 
> > For the best performance, we recommend users to use 8 cores on a single 
> > particular node, not distributed with multi node.
> > Before I said, he uses VASP application compiled with Intel MPI. So he uses 
> > Intel MPI now.
> 
> Which version of Intel MPI? Even with the latest one it's not tightly 
> integrated by default (despite the fact, that MPICH3 [on which it is based] 
> is tightly integrated by default).
> 
> Depending on the version it might be necessary to make some adjustments - 
> IIRC mainly use `mpiexec.hydra` instead of `mpiexec` and supply a wrapper to 
> catch the `rsh`/`ssh` call (like in the MPI demo in SGE's directory).
> 
> -- Reuti
> 
> 
> > --Sangmin
> >
> >
> > On Tue, Jun 10, 2014 at 5:58 PM, Reuti <[email protected]> wrote:
> > Hi,
> >
> > Am 10.06.2014 um 10:21 schrieb Sangmin Park:
> >
> > > This user does always parallel job using VASP application.
> > > Usually, he uses 8 cores per a job. Lots of this kind of job have been 
> > > submitted by the user.
> >
> > 8 cores on a particular node or 8 slots across the cluster? What MPI 
> > implementation does he use?
> >
> > -- Reuti
> >
> > NB: Please keep the list posted.
> >
> >
> > > Sangmin
> > >
> > >
> > > On Tue, Jun 10, 2014 at 3:42 PM, Reuti <[email protected]> wrote:
> > > Am 10.06.2014 um 08:00 schrieb Sangmin Park:
> > >
> > > > Hello,
> > > >
> > > > I'm very confused about the output of qacct command.
> > > > I thought CPU column time is the best way to measure resource usage by 
> > > > users through this web page, 
> > > > https://wiki.duke.edu/display/SCSC/Checking+SGE+Usage
> > > >
> > > > But, I have some situation.
> > > > One of users in my institution, actually this user is a one of heavy 
> > > > users, uses lots of HPC resources. To get the resource usage by this 
> > > > user for requirement of the payment, I commanded qacct and the output 
> > > > is below, this is just for May.
> > > >
> > > > OWNER       WALLCLOCK         UTIME         STIME           CPU         
> > > >     MEMORY                 IO                IOW
> > > > ========================================================================================================================
> > > > p012chm       2980810        28.485        35.012       100.634         
> > > >      4.277              0.576              0.000
> > > >
> > > > CPU time is too much small. Because he is very heavy user of our 
> > > > institution, I can not accept this result. However, the WALLCLOCK time 
> > > > is very much.
> > > >
> > > > How do I get correct information of usage resources by users via qacct?
> > >
> > > This may happen in case you have parallel jobs which are not tightly 
> > > integrated into SGE. What types of jobs is the user running?
> > >
> > > -- Reuti
> > >
> > >
> > > > ===========================
> > > > Sangmin Park
> > > > Supercomputing Center
> > > > Ulsan National Institute of Science and Technology(UNIST)
> > > > Ulsan, 689-798, Korea
> > > >
> > > > phone : +82-52-217-4201
> > > > mobile : +82-10-5094-0405
> > > > fax : +82-52-217-4209
> > > > ===========================
> > > > _______________________________________________
> > > > users mailing list
> > > > [email protected]
> > > > https://gridengine.org/mailman/listinfo/users
> > >
> > >
> > >
> > >
> > > --
> > > ===========================
> > > Sangmin Park
> > > Supercomputing Center
> > > Ulsan National Institute of Science and Technology(UNIST)
> > > Ulsan, 689-798, Korea
> > >
> > > phone : +82-52-217-4201
> > > mobile : +82-10-5094-0405
> > > fax : +82-52-217-4209
> > > ===========================
> >
> >
> >
> >
> > --
> > ===========================
> > Sangmin Park
> > Supercomputing Center
> > Ulsan National Institute of Science and Technology(UNIST)
> > Ulsan, 689-798, Korea
> >
> > phone : +82-52-217-4201
> > mobile : +82-10-5094-0405
> > fax : +82-52-217-4209
> > ===========================
> 
> 
> 
> 
> -- 
> ===========================
> Sangmin Park 
> Supercomputing Center
> Ulsan National Institute of Science and Technology(UNIST)
> Ulsan, 689-798, Korea 
> 
> phone : +82-52-217-4201
> mobile : +82-10-5094-0405
> fax : +82-52-217-4209
> ===========================


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Reporting SGE via QACCT output

Reply via email to