Rainer,
what if you explicitly bind tasks to cores ?
mpirun -bind-to core ...
note this is v1.8 syntax ...
v1.6 is now obsolete (Debian folks are working on upgrading it...)
out of curiosity, did you try an other distro such as redhat and the likes,
suse ...
and do you observe the same behavior
Gilles,
I managed to get snapshots of all the /proc//status entries for all
liggghts jobs, but the Cpus_allowed ist similar no matter if the system
was cold or warm booted.
Then I looked around in /proc/ and found sched_debug.
This at least shows, that the liggghts-processes are not spread over
Rainer,
a first step could be to gather /proc/pid/status for your 48 tasks.
then you can
grep Cpus_allowed_list
and see if you find something suspucious.
if your processes are idling, then the scheduler might assign them to the
same core.
in this case, your processes not being spread is a consequ
Am 17.03.2016 um 10:40 schrieb Ralph Castain:
> Just some thoughts offhand:
>
> * what version of OMPI are you using?
dpkg -l openmpi-bin says 1.6.5-8 from Ubuntu 14.04.
>
> * are you saying that after the warm reboot, all 48 procs are running on a
> subset of cores?
Yes. After a cold boot all
Hi,
On 03/17/2016 10:00 AM, Rainer Koenig wrote:
I'm experiencing a strange problem with running LIGGGHTS on 48 core
workstation running Ubuntu 14.04.4 LTS.
If I cold boot the workstation and start one of the examples form
LIGGGHTS then everything looks fine:
$ mpirun -np 48 liggghts < in.chu
Just some thoughts offhand:
* what version of OMPI are you using?
* are you saying that after the warm reboot, all 48 procs are running on a
subset of cores?
* it sounds like some of the cores have been marked as “offline” for some
reason. Make sure you have hwloc installed on the machine, and
Hi,
I'm experiencing a strange problem with running LIGGGHTS on 48 core
workstation running Ubuntu 14.04.4 LTS.
If I cold boot the workstation and start one of the examples form
LIGGGHTS then everything looks fine:
$ mpirun -np 48 liggghts < in.chute_wear
launches the example on all 48 cores,