Jordi, It's basically impossible to force people to call srun somewhere in their batch script. If you only want to allow the very simplest of batch scripts, then you can grep them at job submit time with a job submit plugin, but if their script calls a script which calls a script (etc) which calls srun, you'll never detect that they've done what you wanted. Worse, you'll raise false positives all the time even though the users have done what you wanted, just some levels down.
We have a wrapper around the MPI job starters that we support (MVAPICH2 and Intel MPI) that calls the right startup mechanisms with the right arguments. But we haven't tried to force our users to use this script. The vast majority of them do what we want because a) we train them on it and document it well, and b) our method is generally easier to use than the other options. For monitoring, you might check out the project that I work on called TACC Stats which provides accounting and performance monitoring for HPC jobs. Some parts of the project are in a state a flux as we are adding new features, but things should begin to stabilize this summer. TACC Stats will also be working with a sister project called XALT which will also have its first release this summer which will provide information about executables and libraries used by HPC jobs. More information and source code for TACC Stats can be found on GitHub, and XALT should be available on GitHub later this summer. git clone [email protected]:rtevans/tacc_stats.git (this will eventually move the the main TACC GitHub, but that's a work in progress) Best, Bill. -- Bill Barth, Ph.D., Director, HPC [email protected] | Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 6/10/14 6:19 AM, "Jordi Blasco" <[email protected]> wrote: >Hi, > > >we are using Snoopy library (https://github.com/a2o/snoopy) in order to >monitor and collect statistics regarding to the applications used in the >HPC resources. >Since there are more than 30% of the jobs in our database without any >information in this regard, it seems that Snoopy is not capable to track >everything. > > >Some other tools like PerfMiner or monitor >(http://web.eecs.utk.edu/~mucci/monitor/) are used in several places, but >since it relies on PapiEx (http://icl.cs.utk.edu/~mucci/papiex/), > and this project is no longer supported, I would like to know if there >is some other approach to collect this data. > > >In addition to that, I would like to know if it can be possible to >enforce to use srun in the submit script. I used a sbatch wrapper before, >but maybe there is now a better way to do it. > > >Thanks! > > >Regards, > > >Jordi >
