Hi Ole, Ole Holm Nielsen <[email protected]> writes:
> As a small contribution to the Slurm community, I've moved my collection of > Slurm tools to GitHub at https://github.com/OleHolmNielsen/Slurm_tools. These > are tools which I feel makes the daily cluster monitoring and management a > little easier. > > The following Slurm tools are available: > > * pestat Prints a Slurm cluster nodes status with 1 line per node and job > info. > > * slurmreportmonth Generate monthly accounting statistics from Slurm using the > sreport command. > > * showuserjobs Print the current node status and batch jobs status broken down > into userids. > > * slurmibtopology Infiniband topology tool for Slurm. > > * Slurm triggers scripts. > > * Scripts for managing nodes. > > * Scripts for managing jobs. > > The tools "pestat" and "slurmibtopology" have previously been announced to > this > list, but future updates will be on GitHub only. > > I would also like to mention our Slurm deployment HowTo guide at > https://wiki.fysik.dtu.dk/niflheim/SLURM > > /Ole Thanks for sharing your tools. Here are some brief comments - psjob/psnode - The USERLIST variable makes the commands a bit brittle, since ps will fail if you pass an unknown username. - showuserjobs - Doesn't handle usernames longer than 8-chars (we have longer names) - The grouping doesn't seem quite correct. As shown in the example below, not all the users of the group appear under the group total for the appropriate group: Username Jobs CPUs Jobs CPUs Group Further info ======== ==== ===== ==== ===== ======== ============================= GRAND_TOTAL 168 1089 55 451 ALL running+idle=1540 CPUs 29 users GROUP_TOTAL 56 349 10 119 group01 running+idle=468 CPUs 8 users user01 27 324 4 52 group02 One, User GROUP_TOTAL 27 324 4 52 group02 running+idle=376 CPUs 1 users user02 29 174 1 6 group01 Two, User GROUP_TOTAL 5 148 18 208 group03 running+idle=356 CPUs 4 users user03 3 120 16 176 group03 Three, User user04 11 96 3 48 group01 Four, User ... In general, maybe it would good to have a common config file, where things such as paths to binaries, USERLIST and username lengths are defined. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email [email protected]
