On 2017-09-05 20:40, John DeSantis wrote: > > Hello all, > > We were recently alerted by a user whose long running jobs (>= 6 days) were > killed by oom. > > A closer look revealed that jag_common_poll_data isn't consistent in logging > the total memory usage, > depending on the application in question. For example, consider the > following job submission: > > #!/bin/bash > # > #SBATCH --job-name=sif14_1co2 > #SBATCH --output=runlog.log > #SBATCH --time=48:00:00 > #SBATCH --nodes=8 --ntasks-per-node=6 > #SBATCH --partition=p2016 --qos=p16 > #SBATCH --mail-type=END > #SBATCH --mail-user=user > > module purge > module add apps/vasp/5.4.1 > > # For standard vasp > mpirun vasp_std > > > Since no memory syntax was offered, the DefMemPerCPU value would take over, > which is 512 MB. > > Looking at the batch host stats, here is what is actually being reported: > > PPID LWP PSR TIME S RSS ELAPSED COMMAND VSZ > 28388 28394 1 00:00:00 S 1332 17:47:24 slurm_script 106116 > 28394 28403 1 00:00:00 S 1404 17:47:24 mpirun 106248 > 28403 28416 10 00:00:00 S 1664 17:47:24 mpiexec.hydra 17180 > 28416 28417 5 00:00:00 S 4996 17:47:24 srun 708424 > 28417 28421 3 00:00:00 S 668 17:47:24 srun 37712 > 28453 28458 0 00:00:00 S 1928 17:47:24 pmi_proxy 16384 > 28458 28471 4 15:55:13 R 1059004 17:47:24 vasp_std 2953636 > 28458 28472 10 17:01:43 R 1057672 17:47:24 vasp_std 2950168 > 28458 28473 20 17:35:54 R 1055412 17:47:24 vasp_std 2947896 > 28458 28474 3 17:53:39 R 1060060 17:47:24 vasp_std 2952712 > 28458 28475 11 17:51:52 R 1055168 17:47:24 vasp_std 2947772 > 28458 28476 23 17:53:42 R 1063888 17:47:24 vasp_std 2956572 > > [2017-09-05T13:11:09.124] [10212120] debug: jag_common_poll_data: Task > average frequency = 2197 pid > 28394 mem size 4400 229544 time 0.020000(0+0) > [2017-09-05T13:11:09.280] [10212120.0] debug: jag_common_poll_data: Task > average frequency = 2197 > pid 28458 mem size 1928 205988 time 0.020000(0+0) > [2017-09-05T13:11:31.729] [10212120] debug: jag_common_poll_data: Task > average frequency = 2197 pid > 28394 mem size 4400 229544 time 0.020000(0+0) > [2017-09-05T13:11:31.757] [10212120.0] debug: jag_common_poll_data: Task > average frequency = 2197 > pid 28458 mem size 1928 205988 time 0.020000(0+0) > [2017-09-05T13:11:39.126] [10212120] debug: jag_common_poll_data: Task > average frequency = 2197 pid > 28394 mem size 4400 229544 time 0.020000(0+0) > [2017-09-05T13:11:39.283] [10212120.0] debug: jag_common_poll_data: Task > average frequency = 2197 > pid 28458 mem size 1928 205988 time 0.020000(0+0) > > However, if a slightly older version of the same software is used, memory > seems to be reported > correctly. Consider the submission script below. > > #!/bin/sh > > #SBATCH --time=04:00:00 > #SBATCH -N 2 --ntasks-per-node=2 -J "vasp_run" > #SBATCH --mem-per-cpu=1800 > #SBATCH --reservation=blah > #SBATCH -p blah > #SBATCH -w nodes-[1-2] > > module purge > module load apps/vasp/5.3.3 > > mpirun vasp > > And, the reported stats: > > PPID LWP PSR TIME S RSS ELAPSED COMMAND > 22046 22050 0 00:00:00 S 1324 03:32 slurm_script > 22050 22053 0 00:00:00 S 1476 03:32 mpirun > 22053 22111 1 00:00:00 S 1428 03:32 mpiexec.hydra > 22111 22112 2 00:00:00 S 4800 03:32 srun > 22112 22113 3 00:00:00 S 668 03:32 srun > 22121 22126 3 00:00:00 S 1352 03:32 pmi_proxy > 22126 22127 5 00:03:30 R 1781544 03:32 vasp > 22126 22128 0 00:03:30 R 1778052 03:32 vasp > > > [2017-09-05T13:17:58.097] [10215464] debug: jag_common_poll_data: Task > average frequency = 2527 pid > 22050 mem size 4228 230384 time 0.010000(0+0) > [2017-09-05T13:17:58.265] [10215464.0] debug: jag_common_poll_data: Task > average frequency = 2527 > pid 22126 mem size 3538576 7561020 time 59.630000(56+2) > [2017-09-05T13:18:28.101] [10215464] debug: jag_common_poll_data: Task > average frequency = 2527 pid > 22050 mem size 4228 230384 time 0.010000(0+0) > [2017-09-05T13:18:28.270] [10215464.0] debug: jag_common_poll_data: Task > average frequency = 2505 > pid 22126 mem size 3560928 7655316 time 119.590000(116+3) > [2017-09-05T13:18:58.105] [10215464] debug: jag_common_poll_data: Task > average frequency = 2527 pid > 22050 mem size 4228 230384 time 0.010000(0+0) > [2017-09-05T13:18:58.275] [10215464.0] debug: jag_common_poll_data: Task > average frequency = 2498 > pid 22126 mem size 3560932 7655316 time 179.570000(175+3) > [2017-09-05T13:19:28.109] [10215464] debug: jag_common_poll_data: Task > average frequency = 2527 pid > 22050 mem size 4228 230384 time 0.010000(0+0) > [2017-09-05T13:19:28.281] [10215464.0] debug: jag_common_poll_data: Task > average frequency = 2495 > pid 22126 mem size 3560932 7655316 time 239.550000(234+4) > [2017-09-05T13:19:58.113] [10215464] debug: jag_common_poll_data: Task > average frequency = 2527 pid > 22050 mem size 4228 230384 time 0.010000(0+0) > [2017-09-05T13:19:58.286] [10215464.0] debug: jag_common_poll_data: Task > average frequency = 2493 > pid 22126 mem size 3560936 7655316 time 299.510000(294+5) > > > > Has anyone else noticed anything similar on their cluster(s)? I cannot > confirm if this was > happening before we upgraded from 15.08.4 to 16.05.10-2. > > Thanks, > John DeSantis > Not knowing the specifics of your setup, AFAIK the general advice remains that unless you're launching the job via srun instead of mpirun (that is, you're using the slurm MPI integration) all bets are off w.r.t. accounting. Since you seem to be using mvapich2(?), make sure you're compiling mvapich2 with
./configure --with-pmi=pmi2 --with-pm=slurm and then launch the application with srun --mpi=pmi2 vasp_std instead of mpirun. (from https://slurm.schedmd.com/mpi_guide.html#mvapich2 ) -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi