[slurm-users] Sshare -l segfaults

2019-07-12 Thread Christopher Benjamin Coffey
Hi All, Has anyone had issues with sshare segfaulting? Specifically with "sshare -l"? Any suggestions on how to figure this one out? Maybe there is something obvious I'm not seeing. This has been happening for many slurm versions, I can't recall when it started. For the last couple versions I'v

Re: [slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread Juergen Salk
Hallo, the cpu vs. cores vs. threads issues also confused me at the very beginning. Although, in general, we do not encourage our users to make use of hyperthreading, we have decided to leave it enabled in the BIOS as there are some use cases that are known to benefit from hyperthreading. I think

Re: [slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread mercan
Hi; If you want to use the threads as cpus, you should set CR_CPU, instead of CR_Core. Regards; Ahmet M. 12.07.2019 21:29 tarihinde mercan yazdı: Hi; You can find the Definitions of Socket, Core, & Thread at: https://slurm.schedmd.com/mc_support.html Your status: CPUs=COREs=Sockets*Cor

Re: [slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread mercan
Hi; You can find the Definitions of Socket, Core, & Thread at: https://slurm.schedmd.com/mc_support.html Your status: CPUs=COREs=Sockets*CoresPerSocket=1*4=4 Threads=COREs*ThreadsPerCore=4*2=8 Regards; Ahmet M. 12.07.2019 20:15 tarihinde Hanu Pathuri yazdı: Hi, Here is my node informa

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Pär Lundö
Hi, Thank you so much for your quick responses! It is much appreciated. I dont have access to the cluster until next week, but I’ll be sure to follow up on all of your suggestions and get back you next week. Have a nice weekend! Best regards Palle From: "slurm-u

[slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread Hanu Pathuri
Hi, Here is my node information. I am confused with the terminology w.r.t CPU vs CORE. NodeName=hpathuri-linux CPUs=8 RealMemory=15833 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN. I am unable to schedule more than 4 tasks without over subscribing even through my configuration look

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread John Hearns
Par, by 'poking around' Crhis means to use tools such as netstat and lsof. Also I would look as ps -eaf --forest to make sure there are no 'orphaned' jusbs sitting on that compute node. Having said that though, I have a dim memory of a classic PBSPro error message which says something about a netw

[slurm-users] Registration for 2019 Slurm User Group Meeting is Open

2019-07-12 Thread Jacob Jenson
Just a reminder that the Early registration for the 2019 Slurm User Group meeting will end in 2 days. You can register at https://slug19.eventbrite.com/ The meeting will be held on 17-18 September 2019 in Salt Lake City at the University of Utah - *Early registration* - May 14 through J

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Chris Samuel
On 12/7/19 7:39 am, Pär Lundö wrote: Presumably, the first 8 tasks originates from the first node (in this case the lxclient11), and the other node (lxclient10) response as predicted. That looks right, it seems the other node has two processes fighting over the same socket and that's breakin

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Pär Lundö
Hi, Thank you for your response. When I do run it ("srun -N2 -n8 hostname") I get an error stating: "srun: job step 83.0 aborted before step completely launched. srun: error: task 0 launced failed: Unspecified error. srun: error: task 1 launced failed: Unspecified error. srun: error: task 2 Launce

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Jeffrey Frey
Have you tried srun -N# -n# mpirun python3 Perhaps you have no MPI environment being setup for the processes? There was no "--mpi" flag in your "srun" command and we don't know if you have a default value for that or not. > On Jul 12, 2019, at 10:28 AM, Chris Samuel wrote:

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Chris Samuel
On 11/7/19 11:04 pm, Pär Lundö wrote: It works fine running on a single node(with ”-N1” instead of ”-N2”), but it is aborted or stopped when running on two nodes. What is the error you get? Does the same srun command but with "hostname" instead of Python work? -- Chris Samuel : http://www

Re: [slurm-users] Running pyMPI on several nodes

2019-07-12 Thread Mark Hahn
I am trouble using or running a python-mpi program involving more than one node. The pythom-mpi program is very simple, do you think there's something unique about the python program? (also, you mean mpi4py, right?) Since authentication with Slurm is used via munge, do I need a passwordless S

[slurm-users] pam_slurm_adopt and memory constraints?

2019-07-12 Thread Juergen Salk
Dear all, I have configured pam_slurm_adopt in our Slurm test environment by following the corresponding documentation: https://slurm.schedmd.com/pam_slurm_adopt.html I've set `PrologFlags=contain´ in slurm.conf and also have task/cgroup enabled along with task/affinity (i.e. `TaskPlugin=task/