Re: [gridengine users] Reporting SGE via QACCT output

Reuti Fri, 13 Jun 2014 00:00:21 -0700

Am 13.06.2014 um 04:30 schrieb Sangmin Park:

> Yes, both files, 'mpiexec' and 'mpiexec.hydra', are in the "bin" directory 
> inside Intel MPI.
> But, 'mpiexec' file is linked to 'mpiexec.py' file.
> Does it okay if I create a symbolic link 'mpiexec' pointing to 
> 'mpiexec.hydra' instead of 'mpiexec.py' ?


Sure, this should work too.

-- Reuti


> Since there are several running jobs on the cluster, I need to check it again 
> and again.
> Thanks.
> 
> --Sangmin
> 
> 
> On Thu, Jun 12, 2014 at 8:04 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 12.06.2014 um 04:23 schrieb Sangmin Park:
> 
> > I've checked the version of Intel MPI. He uses Intel MPI 4.0.3.008 version.
> > Our system uses rsh to access computing nodes. SGE doses, too.
> >
> > Please let me know how to cehck which one is used 'mpiexec.hydry' or 
> > 'mpiexec'.
> 
> Do you have both files somewhere in a "bin" directory inside the Intel MPI? 
> You could rename "mpiexec" and create a symbolic link "mpiexec" pointing to 
> "mpiexec.hydra". The old startup will need some daemons running on the node 
> (which are outside of SGE's control and accounting*), but "mpiexec.hydra" 
> will startup the child processes on its own as kids of its own and should 
> hence be under SGE's control. And as long as you are staying on one and the 
> same node, this should work already without further setup then. To avoid a 
> later surprise when you compute between nodes, the `rsh`/`ssh` should 
> nevertheless being caught and redirected to `qrsh -inherit...` like outlined 
> in "$SGE_ROOT/mpi".
> 
> -- Reuti
> 
> *) It's even possible to force the daemons to be started under SGE, but it's 
> convoluted and not recommended.
> 
> 
> > Sangmin
> >
> >
> > On Wed, Jun 11, 2014 at 6:46 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> > Hi,
> >
> > Am 11.06.2014 um 02:38 schrieb Sangmin Park:
> >
> > > For the best performance, we recommend users to use 8 cores on a single 
> > > particular node, not distributed with multi node.
> > > Before I said, he uses VASP application compiled with Intel MPI. So he 
> > > uses Intel MPI now.
> >
> > Which version of Intel MPI? Even with the latest one it's not tightly 
> > integrated by default (despite the fact, that MPICH3 [on which it is based] 
> > is tightly integrated by default).
> >
> > Depending on the version it might be necessary to make some adjustments - 
> > IIRC mainly use `mpiexec.hydra` instead of `mpiexec` and supply a wrapper 
> > to catch the `rsh`/`ssh` call (like in the MPI demo in SGE's directory).
> >
> > -- Reuti
> >
> >
> > > --Sangmin
> > >
> > >
> > > On Tue, Jun 10, 2014 at 5:58 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> > > Hi,
> > >
> > > Am 10.06.2014 um 10:21 schrieb Sangmin Park:
> > >
> > > > This user does always parallel job using VASP application.
> > > > Usually, he uses 8 cores per a job. Lots of this kind of job have been 
> > > > submitted by the user.
> > >
> > > 8 cores on a particular node or 8 slots across the cluster? What MPI 
> > > implementation does he use?
> > >
> > > -- Reuti
> > >
> > > NB: Please keep the list posted.
> > >
> > >
> > > > Sangmin
> > > >
> > > >
> > > > On Tue, Jun 10, 2014 at 3:42 PM, Reuti <re...@staff.uni-marburg.de> 
> > > > wrote:
> > > > Am 10.06.2014 um 08:00 schrieb Sangmin Park:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'm very confused about the output of qacct command.
> > > > > I thought CPU column time is the best way to measure resource usage 
> > > > > by users through this web page, 
> > > > > https://wiki.duke.edu/display/SCSC/Checking+SGE+Usage
> > > > >
> > > > > But, I have some situation.
> > > > > One of users in my institution, actually this user is a one of heavy 
> > > > > users, uses lots of HPC resources. To get the resource usage by this 
> > > > > user for requirement of the payment, I commanded qacct and the output 
> > > > > is below, this is just for May.
> > > > >
> > > > > OWNER       WALLCLOCK         UTIME         STIME           CPU       
> > > > >       MEMORY                 IO                IOW
> > > > > ========================================================================================================================
> > > > > p012chm       2980810        28.485        35.012       100.634       
> > > > >        4.277              0.576              0.000
> > > > >
> > > > > CPU time is too much small. Because he is very heavy user of our 
> > > > > institution, I can not accept this result. However, the WALLCLOCK 
> > > > > time is very much.
> > > > >
> > > > > How do I get correct information of usage resources by users via 
> > > > > qacct?
> > > >
> > > > This may happen in case you have parallel jobs which are not tightly 
> > > > integrated into SGE. What types of jobs is the user running?
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > ===========================
> > > > > Sangmin Park
> > > > > Supercomputing Center
> > > > > Ulsan National Institute of Science and Technology(UNIST)
> > > > > Ulsan, 689-798, Korea
> > > > >
> > > > > phone : +82-52-217-4201
> > > > > mobile : +82-10-5094-0405
> > > > > fax : +82-52-217-4209
> > > > > ===========================
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users@gridengine.org
> > > > > https://gridengine.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ===========================
> > > > Sangmin Park
> > > > Supercomputing Center
> > > > Ulsan National Institute of Science and Technology(UNIST)
> > > > Ulsan, 689-798, Korea
> > > >
> > > > phone : +82-52-217-4201
> > > > mobile : +82-10-5094-0405
> > > > fax : +82-52-217-4209
> > > > ===========================
> > >
> > >
> > >
> > >
> > > --
> > > ===========================
> > > Sangmin Park
> > > Supercomputing Center
> > > Ulsan National Institute of Science and Technology(UNIST)
> > > Ulsan, 689-798, Korea
> > >
> > > phone : +82-52-217-4201
> > > mobile : +82-10-5094-0405
> > > fax : +82-52-217-4209
> > > ===========================
> >
> >
> >
> >
> > --
> > ===========================
> > Sangmin Park
> > Supercomputing Center
> > Ulsan National Institute of Science and Technology(UNIST)
> > Ulsan, 689-798, Korea
> >
> > phone : +82-52-217-4201
> > mobile : +82-10-5094-0405
> > fax : +82-52-217-4209
> > ===========================
> 
> 
> 
> 
> -- 
> ===========================
> Sangmin Park 
> Supercomputing Center
> Ulsan National Institute of Science and Technology(UNIST)
> Ulsan, 689-798, Korea 
> 
> phone : +82-52-217-4201
> mobile : +82-10-5094-0405
> fax : +82-52-217-4209
> ===========================


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Reporting SGE via QACCT output

Reply via email to