Hi Gilles,
localhost is available in the state where the problem occurs. I discover
suddently this mpi trouble yesterday as I was working in the train...
otherwise I was always connected to the net at home (wifi) or at the
laboratory (wired)
bash-4.2$ ifconfig
enp0s31f6: flags=4099 mtu 1500
Patrick,
Does “no network is available” means the lo interface (localhost 127.0.0.1)
is not even available ?
Cheers,
Gilles
On Monday, January 28, 2019, Patrick Bégou <
patrick.be...@legi.grenoble-inp.fr> wrote:
> Hi,
>
> I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7
>
Brice
>Can you print the pattern before and after thread 1 touched its pages, or even
>in the middle ?
>It looks like somebody is touching too many pages here.
Experimenting with different threads touching one or more pages, I get
unpredicatable results
here on the 8 numa node device, the
Hi,
I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7
laptop. If no network is available I cannot launch a local mpi job on
the laptop:
bash-4.2$ mpirun -np 4 hostname
--
No network interfaces were found
Patrick,
The root cause is we do not include the localhost interface by default
for OOB communications.
You should be able to run with
mpirun --mca oob_tcp_if_include lo -np 4 hostname
Cheers,
Gilles
On 1/28/2019 11:02 PM, Patrick Bégou wrote:
Hi,
I fall in a strange problem with
Patrick,
I double checked the code, and indeed, mpirun should have automatically
felt back
on the loopback interface (and mpirun should have worked)
The virbr0 interface prevented that and this is a bug I fixed in
https://github.com/open-mpi/ompi/pull/6315
Future releases of Open MPI
Le 28/01/2019 à 11:28, Biddiscombe, John A. a écrit :
> If I disable thread 0 and allow thread 1 then I get this pattern on 1 machine
> (clearly wrong)
>
>
Brice
I might have been using the wrong params to hwloc_get_area_memlocation in my
original version, but I bypassed it and have been calling
int get_numa_domain(void *page)
{
HPX_ASSERT( (std::size_t(page) & 4095) ==0 );
void *pages[1] = { page };
Can you try again disabling the touching in one thread to check whether
the other thread only touched its own pages? (others' status should be
-2 (ENOENT))
Recent kernels have ways to migrate memory at runtime
(CONFIG_NUMA_BALANCING) but this should only occur when it detects that
some thread