Re: [slurm-users] FSU & Slurm

2018-04-13 Thread Sean Caron
I respect this is a technical list and that SchedMD is running it so I will say my bit once and keep it short but I think it's important to get these ideas out there. These thoughts are mine and do not constitute any official statement on the part of my employer. My lab management probably likes

Re: [slurm-users] srun and mpirun

2018-04-13 Thread Artem Polyakov
The output is certainly not enough to judge, but my first guess would be that your MPI (what is it btw?) is not support PMI that is enabled in Slurm. Note also, that Slurm now supports 3 ways of doing PMI and from the info that you have provided it is not clear which one you are using. To judge

Re: [slurm-users] Two lines are printed by sacct

2018-04-13 Thread Chris Samuel
On Saturday, 14 April 2018 12:00:59 AM AEST Mahmood Naderan wrote: > What are those? What should I mention in the format option to see the > proper information about three lines of a jobs? Yes. Even just running "sacct" on its own will tell you that information, the reason it doesn't at the

Re: [slurm-users] srun and mpirun

2018-04-13 Thread Chris Samuel
On Saturday, 14 April 2018 1:33:13 AM AEST Mahmood Naderan wrote: > I tried with one of the NAS benchmarks (BT) with 121 threads since the > number of cores should be square. That's an IO benchmark, not going to help you for this. You need something that is compute bound & comms intensive to

Re: [slurm-users] Two lines are printed by sacct

2018-04-13 Thread Mahmood Naderan
Excuse me, I see three lines also! [root@rocks7 ~]# sacct --format=user,cputime,elapsed UserCPUTimeElapsed - -- -- mahmood 02:44:00 00:10:15 02:44:00 00:10:15 mahmood 19:25:52 00:18:13 09:42:56 00:18:13

Re: [slurm-users] srun and mpirun

2018-04-13 Thread Peter Kjellström
On Fri, 13 Apr 2018 13:49:56 +0430 Mahmood Naderan wrote: > Hi, > I see some old posts on the web about performance comparison of srun > vs. mpirun. Is that still an issue? Both the following scripts works > for test programs and surely the performance concerns is not

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-13 Thread Kevin Manalo
I’m asking in the hopes that others will chime in (I’m curious why this is happening) Could you share your related slurm.conf cgroup options cgroup.conf cgroup_allowed_devices_file.conf TaskPlugin ProctrackType JobAcctGatherType -Kevin PS Looking for similar style jobs, We have >1 day gpu

Re: [slurm-users] srun and mpirun

2018-04-13 Thread Chris Samuel
On 13/4/18 7:19 pm, Mahmood Naderan wrote: I see some old posts on the web about performance comparison of srun vs. mpirun. Is that still an issue? Just running an MPI hello world program is not going to test that. You need to run an actual application that is doing a lot of computation and

Re: [slurm-users] FSU & Slurm

2018-04-13 Thread Patrick Goetz
On 04/11/2018 02:35 PM, Sean Caron wrote: As a protest to asking questions on this list and getting solicitations for pay-for support, let me give you some advice for free :) Now, now. Paid support is how they keep the project going. You like using Slurm, right?

Re: [slurm-users] Job runtime

2018-04-13 Thread Mahmood Naderan
Hi Chris, I have been confused with the cpu runtime values in the sacct. For a multinode mpi job, I see these values [mahmood@rocks7 ~]$ sacct --format=jobid,user,cputime,elapsed,totalcpu,ncpus JobID UserCPUTimeElapsed TotalCPU NCPUS - --