Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-23 Thread Gilles Gouaillardet
Rainer, what if you explicitly bind tasks to cores ? mpirun -bind-to core ... note this is v1.8 syntax ... v1.6 is now obsolete (Debian folks are working on upgrading it...) out of curiosity, did you try an other distro such as redhat and the likes, suse ... and do you observe the same

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-23 Thread Rainer Koenig
Gilles, I managed to get snapshots of all the /proc//status entries for all liggghts jobs, but the Cpus_allowed ist similar no matter if the system was cold or warm booted. Then I looked around in /proc/ and found sched_debug. This at least shows, that the liggghts-processes are not spread over

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-22 Thread Gilles Gouaillardet
Rainer, a first step could be to gather /proc/pid/status for your 48 tasks. then you can grep Cpus_allowed_list and see if you find something suspucious. if your processes are idling, then the scheduler might assign them to the same core. in this case, your processes not being spread is a

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-22 Thread Rainer Koenig
Am 17.03.2016 um 10:40 schrieb Ralph Castain: > Just some thoughts offhand: > > * what version of OMPI are you using? dpkg -l openmpi-bin says 1.6.5-8 from Ubuntu 14.04. > > * are you saying that after the warm reboot, all 48 procs are running on a > subset of cores? Yes. After a cold boot

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-18 Thread Thomas Jahns
Hi, On 03/17/2016 10:00 AM, Rainer Koenig wrote: I'm experiencing a strange problem with running LIGGGHTS on 48 core workstation running Ubuntu 14.04.4 LTS. If I cold boot the workstation and start one of the examples form LIGGGHTS then everything looks fine: $ mpirun -np 48 liggghts <

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-17 Thread Ralph Castain
Just some thoughts offhand: * what version of OMPI are you using? * are you saying that after the warm reboot, all 48 procs are running on a subset of cores? * it sounds like some of the cores have been marked as “offline” for some reason. Make sure you have hwloc installed on the machine,

[OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-17 Thread Rainer Koenig
Hi, I'm experiencing a strange problem with running LIGGGHTS on 48 core workstation running Ubuntu 14.04.4 LTS. If I cold boot the workstation and start one of the examples form LIGGGHTS then everything looks fine: $ mpirun -np 48 liggghts < in.chute_wear launches the example on all 48 cores,