subject:"\[OMPI users\] Code failing when requesting all \"processors\""

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-21 Thread Diego Zuccato via users

Il 14/10/20 14:32, Jeff Squyres (jsquyres) ha scritto:

>> The version is 3.1.3 , as packaged in Debian Buster.
> The 3.1.x series is pretty old.  If you want to stay in the 3.1.x
> series, you might try upgrading to the latest -- 3.1.6.  That has a
> bunch of bug fixes compared to v3.1.3.
I'm bound to using distro packages...
I don't have the resources to also compile from sources and debug
interactions between different packages (OMPI, Slurm, OFED... just to
start, and every one would require an expert).

>> I don't know OpenMPI (or even MPI in general) much. Some time ago, I've
>> had to add a
>> mtl = psm2
>> line to /etc/openmpi/openmpi-mca-params.conf .
> This implies that you have Infinipath networking on your cluster.
Actually we have InfiniBand on most of the nodes. All Mellanox cards
(I've been warned about bad interactions between different vendors),
some ConnectX-3 cards (connected to a 40Gbps switch) and some ConnetX-5
ones (connected to a 100Gbps switch, linked to the first). The link
between the two switches is mostly unused, unless for the traffic to the
Gluster servers, over IPoIB.

> I can't imagine what installing gdb would do to mask the problem.  Strange.
Imagine my face when the program started working under gdb, then
continued even when launched directly with no binary changes... :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-20 Thread Jeff Squyres (jsquyres) via users

On Oct 15, 2020, at 3:27 AM, Diego Zuccato  wrote:
> 
>>> The version is 3.1.3 , as packaged in Debian Buster.
>> The 3.1.x series is pretty old.  If you want to stay in the 3.1.x
>> series, you might try upgrading to the latest -- 3.1.6.  That has a
>> bunch of bug fixes compared to v3.1.3.
> I'm bound to using distro packages...

That's going to be a bit limiting, I'm afraid.  There definitely were bug fixes 
after 3.1.3; it's possible that you're running into some things that were fixed 
later in the v3.1.x series.

> I don't have the resources to also compile from sources and debug
> interactions between different packages (OMPI, Slurm, OFED... just to
> start, and every one would require an expert).

You're right that it is a bit daunting.  Sorry about that; it's the nature of 
HPC that there is a large, complicated software stack.

FWIW, one [slightly] simpler method may well be to get your distro's Open MPI 
3.1.3 source package and just tweak it to use Open MPI 3.1.6 instead.  I.e., 
let it use the same build dependencies that are already built in to the source 
package, etc.  That would at least get you an Open MPI install that is 
configured/build/installed exactly the same way as your existing 3.1.3 package.

>>> I don't know OpenMPI (or even MPI in general) much. Some time ago, I've
>>> had to add a
>>> mtl = psm2
>>> line to /etc/openmpi/openmpi-mca-params.conf .
>> This implies that you have Infinipath networking on your cluster.

Sidenote that doesn't actually matter, but just to clarify: I should have said 
s/Infinipath/Omnipath/.  :-)

> Actually we have InfiniBand on most of the nodes. All Mellanox cards
> (I've been warned about bad interactions between different vendors),

It definitely is simpler to stick with a single type of networking.

> some ConnectX-3 cards (connected to a 40Gbps switch) and some ConnetX-5
> ones (connected to a 100Gbps switch, linked to the first). The link
> between the two switches is mostly unused, unless for the traffic to the
> Gluster servers, over IPoIB.

You should probably remove the "mtl=psm2" line then.

1. With Open MPI v3.1.x, it's harmless, but misleading.
2. With Open MPI v4.x., it might cause the wrong type of networking plugin to 
be used on your InfiniBand network (which will just result in your MPI jobs 
failing).

Open MPI has a few "MPI engines" for point-to-point communication under the 
covers: "ob1", "cm", and "ucx" are the most notable (in the Open MPI v4.0.x 
series).  I explained this stuff in a recent series of presentations that we 
gave to the community recently.

Check out "The ABCs of Open MPI: Decoding the Alphabet Soup of the Modern HPC 
Ecosystem (Part 2)" 
(https://www.open-mpi.org/video/?category=general#abcs-of-open-mpi-part-2).  
See slides 28-41 in the PDF, or starting at about 47 minutes in 
https://www.youtube.com/watch?v=C4XfxUoSYQs.

The slides are about Open MPI v4.x (where UCX is the preferred IB transport), 
but most of what is discussed is also applicable to the v3.1.x series.  If I 
recall correctly, the one notable difference is that the "openib" BTL is used 
by default for InfiniBand networks in the v3.1.x series (vs. the UCX PML).

>> I can't imagine what installing gdb would do to mask the problem.  Strange.
> Imagine my face when the program started working under gdb, then
> continued even when launched directly with no binary changes... :)

There must be some kind of strange side effect happening here.  Weird.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-19 Thread Jeff Squyres (jsquyres) via users

On Oct 14, 2020, at 3:07 AM, Diego Zuccato 
mailto:diego.zucc...@unibo.it>> wrote:

Il 13/10/20 16:33, Jeff Squyres (jsquyres) ha scritto:

That's odd.  What version of Open MPI are you using?

The version is 3.1.3 , as packaged in Debian Buster.

The 3.1.x series is pretty old.  If you want to stay in the 3.1.x series, you 
might try upgrading to the latest -- 3.1.6.  That has a bunch of bug fixes 
compared to v3.1.3.

Alternatively, the most recent release series is the v4.0.x series: v4.0.5 is 
the latest in that series.

I don't know OpenMPI (or even MPI in general) much. Some time ago, I've
had to add a
mtl = psm2
line to /etc/openmpi/openmpi-mca-params.conf .

This implies that you have Infinipath networking on your cluster.

Another strangeness is that I've had the same problem on other nodes,
that got "solved" (or, more likely, just "masked") by simply installing
gdb: while trying to debug the issue I noticed that when I installed gdb
I could no longer reproduce the problem. Too bad on this server gdb is
already installed and apparently useless to debug the issue.

I can't imagine what installing gdb would do to mask the problem.  Strange.

--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-19 Thread Diego Zuccato via users

Il 13/10/20 16:33, Jeff Squyres (jsquyres) ha scritto:

> That's odd.  What version of Open MPI are you using?

The version is 3.1.3 , as packaged in Debian Buster.

I don't know OpenMPI (or even MPI in general) much. Some time ago, I've
had to add a
mtl = psm2
line to /etc/openmpi/openmpi-mca-params.conf .

Another strangeness is that I've had the same problem on other nodes,
that got "solved" (or, more likely, just "masked") by simply installing
gdb: while trying to debug the issue I noticed that when I installed gdb
I could no longer reproduce the problem. Too bad on this server gdb is
already installed and apparently useless to debug the issue.

>> On Oct 13, 2020, at 6:34 AM, Diego Zuccato via users 
>>  wrote:
>>
>> Hello all.
>>
>> I have a problem on a server: launching a job with mpirun fails if I
>> request all 32 CPUs (threads, since HT is enabled) but succeeds if I
>> only request 30.
>>
>> The test code is really minimal:
>> -8<--
>> #include "mpi.h"
>> #include 
>> #include 
>> #define  MASTER 0
>>
>> int main (int argc, char *argv[])
>> {
>>  int   numtasks, taskid, len;
>>  char hostname[MPI_MAX_PROCESSOR_NAME];
>>  MPI_Init(, );
>> //  int provided=0;
>> //  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, );
>> //printf("MPI provided threads: %d\n", provided);
>>  MPI_Comm_size(MPI_COMM_WORLD, );
>>  MPI_Comm_rank(MPI_COMM_WORLD,);
>>
>>  if (taskid == MASTER)
>>printf("This is an MPI parallel code for Hello World with no
>> communication\n");
>>  //MPI_Barrier(MPI_COMM_WORLD);
>>
>>
>>  MPI_Get_processor_name(hostname, );
>>
>>  printf ("Hello from task %d on %s!\n", taskid, hostname);
>>
>>  if (taskid == MASTER)
>>printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
>>
>>  MPI_Finalize();
>>
>>  printf("END OF CODE from task %d\n", taskid);
>> }
>> -8<--
>> (the commented section is a leftover of one of the tests).
>>
>> The error is :
>> -8<--
>> [str957-bl0-03:19637] *** Process received signal ***
>> [str957-bl0-03:19637] Signal: Segmentation fault (11)
>> [str957-bl0-03:19637] Signal code: Address not mapped (1)
>> [str957-bl0-03:19637] Failing at address: 0x77fac008
>> [str957-bl0-03:19637] [ 0]
>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
>> [str957-bl0-03:19637] [ 1]
>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
>> [str957-bl0-03:19637] [ 2]
>> /usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
>> [str957-bl0-03:19637] [ 3]
>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
>> [str957-bl0-03:19637] [ 4]
>> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
>> [str957-bl0-03:19637] [ 5]
>> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
>> [str957-bl0-03:19637] [ 6]
>> /usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
>> [str957-bl0-03:19637] [ 7]
>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
>> [str957-bl0-03:19637] [ 8]
>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
>> [str957-bl0-03:19637] [ 9]
>> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
>> [str957-bl0-03:19637] [10]
>> /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
>> [str957-bl0-03:19637] [11]
>> /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
>> [str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
>> [str957-bl0-03:19637] [13]
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
>> [str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
>> [str957-bl0-03:19637] *** End of error message ***
>> -8<--
>>
>> I'm using Debian stable packages. On other servers there is no problem
>> (but there was in the past, and it got "solved" by just installing gdb).
>>
>> Any hints?
>>
>> TIA
>>
>> -- 
>> Diego Zuccato
>> DIFA - Dip. di Fisica e Astronomia
>> Servizi Informatici
>> Alma Mater Studiorum - Università di Bologna
>> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>> tel.: +39 051 20 95786
> 
> 


-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Jeff Squyres (jsquyres) via users

On Oct 13, 2020, at 10:43 AM, Gus Correa via users  
wrote:
> 
> Can you use taskid after MPI_Finalize?

Yes.  It's a variable, just like any other.

> Isn't it undefined/deallocated at that point?

No.  MPI filled it in during MPI_Comm_rank() and then never touched it again.

So even though MPI may have shut down, the value that it loaded into taskid is 
still valid/initialized.

-- 
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Gus Correa via users

Can you use taskid after MPI_Finalize?
Isn't it undefined/deallocated at that point?
Just a question (... or two) ...

Gus Correa

>  MPI_Finalize();
>
>  printf("END OF CODE from task %d\n", taskid);





On Tue, Oct 13, 2020 at 10:34 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:

> That's odd.  What version of Open MPI are you using?
>
>
> > On Oct 13, 2020, at 6:34 AM, Diego Zuccato via users <
> users@lists.open-mpi.org> wrote:
> >
> > Hello all.
> >
> > I have a problem on a server: launching a job with mpirun fails if I
> > request all 32 CPUs (threads, since HT is enabled) but succeeds if I
> > only request 30.
> >
> > The test code is really minimal:
> > -8<--
> > #include "mpi.h"
> > #include 
> > #include 
> > #define  MASTER 0
> >
> > int main (int argc, char *argv[])
> > {
> >  int   numtasks, taskid, len;
> >  char hostname[MPI_MAX_PROCESSOR_NAME];
> >  MPI_Init(, );
> > //  int provided=0;
> > //  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, );
> > //printf("MPI provided threads: %d\n", provided);
> >  MPI_Comm_size(MPI_COMM_WORLD, );
> >  MPI_Comm_rank(MPI_COMM_WORLD,);
> >
> >  if (taskid == MASTER)
> >printf("This is an MPI parallel code for Hello World with no
> > communication\n");
> >  //MPI_Barrier(MPI_COMM_WORLD);
> >
> >
> >  MPI_Get_processor_name(hostname, );
> >
> >  printf ("Hello from task %d on %s!\n", taskid, hostname);
> >
> >  if (taskid == MASTER)
> >printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
> >
> >  MPI_Finalize();
> >
> >  printf("END OF CODE from task %d\n", taskid);
> > }
> > -8<--
> > (the commented section is a leftover of one of the tests).
> >
> > The error is :
> > -8<--
> > [str957-bl0-03:19637] *** Process received signal ***
> > [str957-bl0-03:19637] Signal: Segmentation fault (11)
> > [str957-bl0-03:19637] Signal code: Address not mapped (1)
> > [str957-bl0-03:19637] Failing at address: 0x77fac008
> > [str957-bl0-03:19637] [ 0]
> > /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
> > [str957-bl0-03:19637] [ 1]
> >
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
> > [str957-bl0-03:19637] [ 2]
> >
> /usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
> > [str957-bl0-03:19637] [ 3]
> >
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
> > [str957-bl0-03:19637] [ 4]
> >
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
> > [str957-bl0-03:19637] [ 5]
> >
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
> > [str957-bl0-03:19637] [ 6]
> > /usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
> > [str957-bl0-03:19637] [ 7]
> >
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
> > [str957-bl0-03:19637] [ 8]
> >
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
> > [str957-bl0-03:19637] [ 9]
> >
> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
> > [str957-bl0-03:19637] [10]
> >
> /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
> > [str957-bl0-03:19637] [11]
> > /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
> > [str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
> > [str957-bl0-03:19637] [13]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
> > [str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
> > [str957-bl0-03:19637] *** End of error message ***
> > -8<--
> >
> > I'm using Debian stable packages. On other servers there is no problem
> > (but there was in the past, and it got "solved" by just installing gdb).
> >
> > Any hints?
> >
> > TIA
> >
> > --
> > Diego Zuccato
> > DIFA - Dip. di Fisica e Astronomia
> > Servizi Informatici
> > Alma Mater Studiorum - Università di Bologna
> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > tel.: +39 051 20 95786
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>

Re: [OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Jeff Squyres (jsquyres) via users

That's odd.  What version of Open MPI are you using?


> On Oct 13, 2020, at 6:34 AM, Diego Zuccato via users 
>  wrote:
> 
> Hello all.
> 
> I have a problem on a server: launching a job with mpirun fails if I
> request all 32 CPUs (threads, since HT is enabled) but succeeds if I
> only request 30.
> 
> The test code is really minimal:
> -8<--
> #include "mpi.h"
> #include 
> #include 
> #define  MASTER 0
> 
> int main (int argc, char *argv[])
> {
>  int   numtasks, taskid, len;
>  char hostname[MPI_MAX_PROCESSOR_NAME];
>  MPI_Init(, );
> //  int provided=0;
> //  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, );
> //printf("MPI provided threads: %d\n", provided);
>  MPI_Comm_size(MPI_COMM_WORLD, );
>  MPI_Comm_rank(MPI_COMM_WORLD,);
> 
>  if (taskid == MASTER)
>printf("This is an MPI parallel code for Hello World with no
> communication\n");
>  //MPI_Barrier(MPI_COMM_WORLD);
> 
> 
>  MPI_Get_processor_name(hostname, );
> 
>  printf ("Hello from task %d on %s!\n", taskid, hostname);
> 
>  if (taskid == MASTER)
>printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
> 
>  MPI_Finalize();
> 
>  printf("END OF CODE from task %d\n", taskid);
> }
> -8<--
> (the commented section is a leftover of one of the tests).
> 
> The error is :
> -8<--
> [str957-bl0-03:19637] *** Process received signal ***
> [str957-bl0-03:19637] Signal: Segmentation fault (11)
> [str957-bl0-03:19637] Signal code: Address not mapped (1)
> [str957-bl0-03:19637] Failing at address: 0x77fac008
> [str957-bl0-03:19637] [ 0]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
> [str957-bl0-03:19637] [ 1]
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
> [str957-bl0-03:19637] [ 2]
> /usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
> [str957-bl0-03:19637] [ 3]
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
> [str957-bl0-03:19637] [ 4]
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
> [str957-bl0-03:19637] [ 5]
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
> [str957-bl0-03:19637] [ 6]
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
> [str957-bl0-03:19637] [ 7]
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
> [str957-bl0-03:19637] [ 8]
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
> [str957-bl0-03:19637] [ 9]
> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
> [str957-bl0-03:19637] [10]
> /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
> [str957-bl0-03:19637] [11]
> /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
> [str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
> [str957-bl0-03:19637] [13]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
> [str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
> [str957-bl0-03:19637] *** End of error message ***
> -8<--
> 
> I'm using Debian stable packages. On other servers there is no problem
> (but there was in the past, and it got "solved" by just installing gdb).
> 
> Any hints?
> 
> TIA
> 
> -- 
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786


-- 
Jeff Squyres
jsquy...@cisco.com

[OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Diego Zuccato via users

Hello all.

I have a problem on a server: launching a job with mpirun fails if I
request all 32 CPUs (threads, since HT is enabled) but succeeds if I
only request 30.

The test code is really minimal:
-8<--
#include "mpi.h"
#include 
#include 
#define  MASTER 0

int main (int argc, char *argv[])
{
  int   numtasks, taskid, len;
  char hostname[MPI_MAX_PROCESSOR_NAME];
  MPI_Init(, );
//  int provided=0;
//  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, );
//printf("MPI provided threads: %d\n", provided);
  MPI_Comm_size(MPI_COMM_WORLD, );
  MPI_Comm_rank(MPI_COMM_WORLD,);

  if (taskid == MASTER)
printf("This is an MPI parallel code for Hello World with no
communication\n");
  //MPI_Barrier(MPI_COMM_WORLD);


  MPI_Get_processor_name(hostname, );

  printf ("Hello from task %d on %s!\n", taskid, hostname);

  if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);

  MPI_Finalize();

  printf("END OF CODE from task %d\n", taskid);
}
-8<--
(the commented section is a leftover of one of the tests).

The error is :
-8<--
[str957-bl0-03:19637] *** Process received signal ***
[str957-bl0-03:19637] Signal: Segmentation fault (11)
[str957-bl0-03:19637] Signal code: Address not mapped (1)
[str957-bl0-03:19637] Failing at address: 0x77fac008
[str957-bl0-03:19637] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
[str957-bl0-03:19637] [ 1]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
[str957-bl0-03:19637] [ 2]
/usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
[str957-bl0-03:19637] [ 3]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
[str957-bl0-03:19637] [ 4]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
[str957-bl0-03:19637] [ 5]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
[str957-bl0-03:19637] [ 6]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
[str957-bl0-03:19637] [ 7]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
[str957-bl0-03:19637] [ 8]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
[str957-bl0-03:19637] [ 9]
/usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
[str957-bl0-03:19637] [10]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
[str957-bl0-03:19637] [11]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
[str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
[str957-bl0-03:19637] [13]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
[str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
[str957-bl0-03:19637] *** End of error message ***
-8<--

I'm using Debian stable packages. On other servers there is no problem
(but there was in the past, and it got "solved" by just installing gdb).

Any hints?

TIA

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Re: [OMPI users] Code failing when requesting all "processors"

Re: [OMPI users] Code failing when requesting all "processors"

Re: [OMPI users] Code failing when requesting all "processors"

Re: [OMPI users] Code failing when requesting all "processors"

Re: [OMPI users] Code failing when requesting all "processors"

Re: [OMPI users] Code failing when requesting all "processors"

Re: [OMPI users] Code failing when requesting all "processors"

[OMPI users] Code failing when requesting all "processors"

8 matches

Site Navigation

Mail list logo

Footer information