subject:"Re\: \[OMPI users\] Several Bcast calls in a loop causing the code to hang"

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread George Bosilca

On Mon, Feb 23, 2015 at 4:37 PM, Joshua Ladd  wrote:

> Nathan,
>
> I do, but the hang comes later on. It looks like it's a situation where
> the root is way, way faster than the children and he's inducing an an
> overrun in the unexpected message queue. I think the queue is set to just
> keep growing and it eventually blows up the memory??
>

This is indeed possible there is no control flow in the PML (not in most of
the BTLs). However, the fact that the received prefers to drain the
incoming queues instead of returning back to MPI is slightly disturbing.

  George.


> $/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
> --display-map -mca btl vader,self ./a.out
>  Data for JOB [14187,1] offset 0
>
>     JOB MAP   
>
>  Data for node: mngx-apl-01 Num slots: 16   Max slots: 0Num procs:
> 3
> Process OMPI jobid: [14187,1] App: 0 Process rank: 0
> Process OMPI jobid: [14187,1] App: 0 Process rank: 1
> Process OMPI jobid: [14187,1] App: 0 Process rank: 2
>
>  =
> rank 2, m = 0
> rank 0, m = 0
> rank 1, m = 0
> rank 0, m = 1000
> rank 0, m = 2000
> rank 0, m = 3000
> rank 2, m = 1000
> rank 1, m = 1000
> rank 0, m = 4000
> rank 0, m = 5000
> rank 0, m = 6000
> rank 0, m = 7000
> rank 1, m = 2000
> rank 2, m = 2000
> rank 0, m = 8000
> rank 0, m = 9000
> rank 0, m = 1
> rank 0, m = 11000
> rank 2, m = 3000
> rank 1, m = 3000
> rank 0, m = 12000
> rank 0, m = 13000
> rank 0, m = 14000
> rank 1, m = 4000
> rank 2, m = 4000
> rank 0, m = 15000
> rank 0, m = 16000
> rank 0, m = 17000
> rank 0, m = 18000
> rank 1, m = 5000
> rank 2, m = 5000
> rank 0, m = 19000
> rank 0, m = 2
> rank 0, m = 21000
> rank 0, m = 22000
> rank 2, m = 6000 <--- Finally hangs when Ranks 2 and 1 are at 6000 but
> rank 0, the root, is at 22,000
> rank 1, m = 6000
>
> It fails with the ompi_coll_tuned_bcast_intra_split_bintree algorithm in
> Tuned - looks like a scatter/allgather type of operation. It's in the
> allgather phase during the bidirectional send/recv that things go bad.
> There are no issues running this under "Basic" colls.
>
> Josh
>
>
>
>
> On Mon, Feb 23, 2015 at 4:13 PM, Nathan Hjelm  wrote:
>
>>
>> Josh, do you see a hang when using vader? It is preferred over the old
>> sm btl.
>>
>> -Nathan
>>
>> On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
>> >Sachin,
>> >
>> >I am able to reproduce something funny. Looks like your issue. When
>> I run
>> >on a single host with two ranks, the test works fine. However, when
>> I try
>> >three or more, it looks like only the root, rank 0, is making any
>> progress
>> >after the first iteration.
>> >
>> >$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
>> -np 3
>> >-mca btl self,sm ./bcast_loop
>> >rank 0, m = 0
>> >rank 1, m = 0
>> >rank 2, m = 0
>> >rank 0, m = 1000
>> >rank 0, m = 2000
>> >rank 0, m = 3000
>> >rank 0, m = 4000
>> >rank 0, m = 5000
>> >rank 0, m = 6000
>> >rank 0, m = 7000
>> >rank 0, m = 8000
>> >rank 0, m = 9000
>> >rank 0, m = 1
>> >rank 0, m = 11000
>> >rank 0, m = 12000
>> >rank 0, m = 13000
>> >rank 0, m = 14000
>> >rank 0, m = 15000
>> >rank 0, m = 16000   <- Hanging
>> >
>> >After hanging for a while, I get an OOM kernel panic message:
>> >
>> >joshual@mngx-apl-01 ~
>> >$
>> >Message from syslogd@localhost at Feb 23 22:42:17 ...
>> > kernel:Kernel panic - not syncing: Out of memory: system-wide
>> >panic_on_oom is enabled
>> >
>> >Message from syslogd@localhost at Feb 23 22:42:17 ...
>> > kernel:
>> >
>> >With TCP BTL the result is sensible, i.e. I see three ranks
>> reporting for
>> >each multiple of 1000:
>> >$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
>> -np 3
>> >-mca btl self,tcp ./a.out
>> >rank 1, m = 0
>> >rank 2, m = 0
>> >rank 0, m = 0
>> >rank 0, m = 1000
>> >rank 2, m = 1000
>> >rank 1, m = 1000
>> >rank 1, m = 2000
>> >rank 0, m = 2000
>> >rank 2, m = 2000
>> >rank 0, m = 3000
>> >rank 2, m = 3000
>> >rank 1, m = 3000
>> >rank 0, m = 4000
>> >rank 1, m = 4000
>> >rank 2, m = 4000
>> >rank 0, m = 5000
>> >rank 2, m = 5000
>> >rank 1, m = 5000
>> >rank 0, m = 6000
>> >rank 1, m = 6000
>> >rank 2, m = 6000
>> >rank 2, m = 7000
>> >rank 1, m = 7000
>> >rank 0, m = 7000
>> >rank 0, m = 8000
>> >rank 2, m = 8000
>> >rank 1, m = 8000
>> >rank 0, m = 9000
>> >rank 2, m = 9000
>> >rank 1, m = 9000
>> >rank 2, m = 1
>> >rank 0, m = 1
>> >rank 1, m = 1
>> >rank 1, m = 11000
>> >rank 0, m = 11000
>> >rank 2, m = 11000
>> >rank 2, m = 12000
>> >rank 1, m = 12000
>> >rank

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Joshua Ladd

Nathan,

I do, but the hang comes later on. It looks like it's a situation where the
root is way, way faster than the children and he's inducing an an overrun
in the unexpected message queue. I think the queue is set to just keep
growing and it eventually blows up the memory??

$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
--display-map -mca btl vader,self ./a.out
 Data for JOB [14187,1] offset 0

    JOB MAP   

 Data for node: mngx-apl-01 Num slots: 16   Max slots: 0Num procs: 3
Process OMPI jobid: [14187,1] App: 0 Process rank: 0
Process OMPI jobid: [14187,1] App: 0 Process rank: 1
Process OMPI jobid: [14187,1] App: 0 Process rank: 2

 =
rank 2, m = 0
rank 0, m = 0
rank 1, m = 0
rank 0, m = 1000
rank 0, m = 2000
rank 0, m = 3000
rank 2, m = 1000
rank 1, m = 1000
rank 0, m = 4000
rank 0, m = 5000
rank 0, m = 6000
rank 0, m = 7000
rank 1, m = 2000
rank 2, m = 2000
rank 0, m = 8000
rank 0, m = 9000
rank 0, m = 1
rank 0, m = 11000
rank 2, m = 3000
rank 1, m = 3000
rank 0, m = 12000
rank 0, m = 13000
rank 0, m = 14000
rank 1, m = 4000
rank 2, m = 4000
rank 0, m = 15000
rank 0, m = 16000
rank 0, m = 17000
rank 0, m = 18000
rank 1, m = 5000
rank 2, m = 5000
rank 0, m = 19000
rank 0, m = 2
rank 0, m = 21000
rank 0, m = 22000
rank 2, m = 6000 <--- Finally hangs when Ranks 2 and 1 are at 6000 but
rank 0, the root, is at 22,000
rank 1, m = 6000

It fails with the ompi_coll_tuned_bcast_intra_split_bintree algorithm in
Tuned - looks like a scatter/allgather type of operation. It's in the
allgather phase during the bidirectional send/recv that things go bad.
There are no issues running this under "Basic" colls.

Josh




On Mon, Feb 23, 2015 at 4:13 PM, Nathan Hjelm  wrote:

>
> Josh, do you see a hang when using vader? It is preferred over the old
> sm btl.
>
> -Nathan
>
> On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
> >Sachin,
> >
> >I am able to reproduce something funny. Looks like your issue. When I
> run
> >on a single host with two ranks, the test works fine. However, when I
> try
> >three or more, it looks like only the root, rank 0, is making any
> progress
> >after the first iteration.
> >
> >$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
> -np 3
> >-mca btl self,sm ./bcast_loop
> >rank 0, m = 0
> >rank 1, m = 0
> >rank 2, m = 0
> >rank 0, m = 1000
> >rank 0, m = 2000
> >rank 0, m = 3000
> >rank 0, m = 4000
> >rank 0, m = 5000
> >rank 0, m = 6000
> >rank 0, m = 7000
> >rank 0, m = 8000
> >rank 0, m = 9000
> >rank 0, m = 1
> >rank 0, m = 11000
> >rank 0, m = 12000
> >rank 0, m = 13000
> >rank 0, m = 14000
> >rank 0, m = 15000
> >rank 0, m = 16000   <- Hanging
> >
> >After hanging for a while, I get an OOM kernel panic message:
> >
> >joshual@mngx-apl-01 ~
> >$
> >Message from syslogd@localhost at Feb 23 22:42:17 ...
> > kernel:Kernel panic - not syncing: Out of memory: system-wide
> >panic_on_oom is enabled
> >
> >Message from syslogd@localhost at Feb 23 22:42:17 ...
> > kernel:
> >
> >With TCP BTL the result is sensible, i.e. I see three ranks reporting
> for
> >each multiple of 1000:
> >$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
> -np 3
> >-mca btl self,tcp ./a.out
> >rank 1, m = 0
> >rank 2, m = 0
> >rank 0, m = 0
> >rank 0, m = 1000
> >rank 2, m = 1000
> >rank 1, m = 1000
> >rank 1, m = 2000
> >rank 0, m = 2000
> >rank 2, m = 2000
> >rank 0, m = 3000
> >rank 2, m = 3000
> >rank 1, m = 3000
> >rank 0, m = 4000
> >rank 1, m = 4000
> >rank 2, m = 4000
> >rank 0, m = 5000
> >rank 2, m = 5000
> >rank 1, m = 5000
> >rank 0, m = 6000
> >rank 1, m = 6000
> >rank 2, m = 6000
> >rank 2, m = 7000
> >rank 1, m = 7000
> >rank 0, m = 7000
> >rank 0, m = 8000
> >rank 2, m = 8000
> >rank 1, m = 8000
> >rank 0, m = 9000
> >rank 2, m = 9000
> >rank 1, m = 9000
> >rank 2, m = 1
> >rank 0, m = 1
> >rank 1, m = 1
> >rank 1, m = 11000
> >rank 0, m = 11000
> >rank 2, m = 11000
> >rank 2, m = 12000
> >rank 1, m = 12000
> >rank 0, m = 12000
> >rank 1, m = 13000
> >rank 0, m = 13000
> >rank 2, m = 13000
> >rank 1, m = 14000
> >rank 2, m = 14000
> >rank 0, m = 14000
> >rank 1, m = 15000
> >rank 0, m = 15000
> >rank 2, m = 15000
> >etc...
> >
> >It looks like a bug in the SM BTL. I can poke some more at this
> tomorrow.
> >
> >Josh
> >On Sun, Feb 22, 2015 at 11:18 PM, Sachin Krishnan  >
> >wrote:
> >
> >  George,
> >  I was able to run the code without any errors in an

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Nathan Hjelm


Josh, do you see a hang when using vader? It is preferred over the old
sm btl.

-Nathan

On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
>Sachin,
> 
>I am able to reproduce something funny. Looks like your issue. When I run
>on a single host with two ranks, the test works fine. However, when I try
>three or more, it looks like only the root, rank 0, is making any progress
>after the first iteration.
> 
>$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
>-mca btl self,sm ./bcast_loop
>rank 0, m = 0
>rank 1, m = 0
>rank 2, m = 0
>rank 0, m = 1000
>rank 0, m = 2000
>rank 0, m = 3000
>rank 0, m = 4000
>rank 0, m = 5000
>rank 0, m = 6000
>rank 0, m = 7000
>rank 0, m = 8000
>rank 0, m = 9000
>rank 0, m = 1
>rank 0, m = 11000
>rank 0, m = 12000
>rank 0, m = 13000
>rank 0, m = 14000
>rank 0, m = 15000
>rank 0, m = 16000   <- Hanging
> 
>After hanging for a while, I get an OOM kernel panic message:
> 
>joshual@mngx-apl-01 ~
>$
>Message from syslogd@localhost at Feb 23 22:42:17 ...
> kernel:Kernel panic - not syncing: Out of memory: system-wide
>panic_on_oom is enabled
> 
>Message from syslogd@localhost at Feb 23 22:42:17 ...
> kernel:
> 
>With TCP BTL the result is sensible, i.e. I see three ranks reporting for
>each multiple of 1000:
>$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
>-mca btl self,tcp ./a.out
>rank 1, m = 0
>rank 2, m = 0
>rank 0, m = 0
>rank 0, m = 1000
>rank 2, m = 1000
>rank 1, m = 1000
>rank 1, m = 2000
>rank 0, m = 2000
>rank 2, m = 2000
>rank 0, m = 3000
>rank 2, m = 3000
>rank 1, m = 3000
>rank 0, m = 4000
>rank 1, m = 4000
>rank 2, m = 4000
>rank 0, m = 5000
>rank 2, m = 5000
>rank 1, m = 5000
>rank 0, m = 6000
>rank 1, m = 6000
>rank 2, m = 6000
>rank 2, m = 7000
>rank 1, m = 7000
>rank 0, m = 7000
>rank 0, m = 8000
>rank 2, m = 8000
>rank 1, m = 8000
>rank 0, m = 9000
>rank 2, m = 9000
>rank 1, m = 9000
>rank 2, m = 1
>rank 0, m = 1
>rank 1, m = 1
>rank 1, m = 11000
>rank 0, m = 11000
>rank 2, m = 11000
>rank 2, m = 12000
>rank 1, m = 12000
>rank 0, m = 12000
>rank 1, m = 13000
>rank 0, m = 13000
>rank 2, m = 13000
>rank 1, m = 14000
>rank 2, m = 14000
>rank 0, m = 14000
>rank 1, m = 15000
>rank 0, m = 15000
>rank 2, m = 15000
>etc...
> 
>It looks like a bug in the SM BTL. I can poke some more at this tomorrow.
> 
>Josh
>On Sun, Feb 22, 2015 at 11:18 PM, Sachin Krishnan 
>wrote:
> 
>  George,
>  I was able to run the code without any errors in an older version of
>  OpenMPI in another machine. It looks like some problem with my machine
>  like Josh pointed out. 
>  Adding --mca coll tuned or basic  to the mpirun command resulted in an
>  MPI_Init failed error with the following additional information for the
>  Open MPI developer:
>   mca_coll_base_comm_select(MPI_COMM_WORLD) failed
>--> Returned "Not found" (-13) instead of "Success" (0)
>  Thanks for the help.
>  Sachin
>  On Mon, Feb 23, 2015 at 4:17 AM, George Bosilca 
>  wrote:
> 
>Sachin, 
>I cant replicate your issue neither with the latest 1.8 nor with the
>trunk. I tried using a single host, while forcing SM and then TP to no
>avail.
>Can you try restricting the collective modules in use (adding --mca
>coll tuned,basic) to your mpirun command?
>  George.
>On Fri, Feb 20, 2015 at 9:31 PM, Sachin Krishnan 
>wrote:
> 
>  Josh,
>  Thanks for the help.
>  I'm running on a single host. How do I confirm that it is an issue
>  with the shared memory?
>  Sachin
>  On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd 
>  wrote:
> 
>Sachin,
> 
>Are you running this on a single host or across multiple hosts
>(i.e. are you communicating between processes via networking.) If
>it's on a single host, then it might be an issue with shared
>memory.
> 
>Josh
>On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan
> wrote:
> 
>  Hello Josh,
> 
>  The command i use to compile the code is:
> 
>  mpicc bcast_loop.c
> 
>  To run the code I use:
> 
>  mpirun -np 2 ./a.out
> 
>  Output is unpredictable. It gets stuck at different places.
> 
>  Im attaching lstopo and ompi_info outputs. Do you need any other
>  info?
> 
>  lstopo-no-graphics

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-23 Thread Joshua Ladd

Sachin,

I am able to reproduce something funny. Looks like your issue. When I run
on a single host with two ranks, the test works fine. However, when I try
three or more, it looks like only the root, rank 0, is making any progress
after the first iteration.

$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
-mca btl self,sm ./bcast_loop
rank 0, m = 0
rank 1, m = 0
rank 2, m = 0
rank 0, m = 1000
rank 0, m = 2000
rank 0, m = 3000
rank 0, m = 4000
rank 0, m = 5000
rank 0, m = 6000
rank 0, m = 7000
rank 0, m = 8000
rank 0, m = 9000
rank 0, m = 1
rank 0, m = 11000
rank 0, m = 12000
rank 0, m = 13000
rank 0, m = 14000
rank 0, m = 15000
rank 0, m = 16000   <- Hanging

After hanging for a while, I get an OOM kernel panic message:

joshual@mngx-apl-01 ~
$
Message from syslogd@localhost at Feb 23 22:42:17 ...
 kernel:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom
is enabled

Message from syslogd@localhost at Feb 23 22:42:17 ...
 kernel:



With TCP BTL the result is sensible, i.e. I see three ranks reporting for
each multiple of 1000:

$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
-mca btl self,tcp ./a.out
rank 1, m = 0
rank 2, m = 0
rank 0, m = 0
rank 0, m = 1000
rank 2, m = 1000
rank 1, m = 1000
rank 1, m = 2000
rank 0, m = 2000
rank 2, m = 2000
rank 0, m = 3000
rank 2, m = 3000
rank 1, m = 3000
rank 0, m = 4000
rank 1, m = 4000
rank 2, m = 4000
rank 0, m = 5000
rank 2, m = 5000
rank 1, m = 5000
rank 0, m = 6000
rank 1, m = 6000
rank 2, m = 6000
rank 2, m = 7000
rank 1, m = 7000
rank 0, m = 7000
rank 0, m = 8000
rank 2, m = 8000
rank 1, m = 8000
rank 0, m = 9000
rank 2, m = 9000
rank 1, m = 9000
rank 2, m = 1
rank 0, m = 1
rank 1, m = 1
rank 1, m = 11000
rank 0, m = 11000
rank 2, m = 11000
rank 2, m = 12000
rank 1, m = 12000
rank 0, m = 12000
rank 1, m = 13000
rank 0, m = 13000
rank 2, m = 13000
rank 1, m = 14000
rank 2, m = 14000
rank 0, m = 14000
rank 1, m = 15000
rank 0, m = 15000
rank 2, m = 15000
etc...

It looks like a bug in the SM BTL. I can poke some more at this tomorrow.

Josh




On Sun, Feb 22, 2015 at 11:18 PM, Sachin Krishnan 
wrote:

> George,
>
> I was able to run the code without any errors in an older version of
> OpenMPI in another machine. It looks like some problem with my machine like
> Josh pointed out.
>
> Adding --mca coll tuned or basic  to the mpirun command resulted in an
> MPI_Init failed error with the following additional information for the
> Open MPI developer:
>
>  mca_coll_base_comm_select(MPI_COMM_WORLD) failed
>   --> Returned "Not found" (-13) instead of "Success" (0)
>
> Thanks for the help.
>
> Sachin
>
> On Mon, Feb 23, 2015 at 4:17 AM, George Bosilca 
> wrote:
>
>> Sachin,
>>
>> I cant replicate your issue neither with the latest 1.8 nor with the
>> trunk. I tried using a single host, while forcing SM and then TP to no
>> avail.
>>
>> Can you try restricting the collective modules in use (adding --mca coll
>> tuned,basic) to your mpirun command?
>>
>>   George.
>>
>>
>> On Fri, Feb 20, 2015 at 9:31 PM, Sachin Krishnan 
>> wrote:
>>
>>> Josh,
>>>
>>> Thanks for the help.
>>> I'm running on a single host. How do I confirm that it is an issue with
>>> the shared memory?
>>>
>>> Sachin
>>>
>>> On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd 
>>> wrote:
>>>
 Sachin,

 Are you running this on a single host or across multiple hosts (i.e.
 are you communicating between processes via networking.) If it's on a
 single host, then it might be an issue with shared memory.

 Josh

 On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan 
 wrote:

> Hello Josh,
>
> The command i use to compile the code is:
>
> mpicc bcast_loop.c
>
>
> To run the code I use:
>
> mpirun -np 2 ./a.out
>
> Output is unpredictable. It gets stuck at different places.
>
> Im attaching lstopo and ompi_info outputs. Do you need any other info?
>
>
> lstopo-no-graphics output:
>
> Machine (3433MB)
>
>   Socket L#0 + L3 L#0 (8192KB)
>
> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
>
>   PU L#0 (P#0)
>
>   PU L#1 (P#4)
>
> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
>
>   PU L#2 (P#1)
>
>   PU L#3 (P#5)
>
> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
>
>   PU L#4 (P#2)
>
>   PU L#5 (P#6)
>
> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
>
>   PU L#6 (P#3)
>
>   PU L#7 (P#7)
>
>   HostBridge L#0
>
> PCI 8086:0162
>
>   GPU L#0 "card0"
>
>   GPU L#1 "renderD128"
>
>   GPU L#2 "controlD64"
>
> PCI 8086:1502
>
>   Net L#3 "eth0"

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-22 Thread Sachin Krishnan

George,

I was able to run the code without any errors in an older version of
OpenMPI in another machine. It looks like some problem with my machine like
Josh pointed out.

Adding --mca coll tuned or basic  to the mpirun command resulted in an
MPI_Init failed error with the following additional information for the
Open MPI developer:

 mca_coll_base_comm_select(MPI_COMM_WORLD) failed
  --> Returned "Not found" (-13) instead of "Success" (0)

Thanks for the help.

Sachin

On Mon, Feb 23, 2015 at 4:17 AM, George Bosilca  wrote:

> Sachin,
>
> I cant replicate your issue neither with the latest 1.8 nor with the
> trunk. I tried using a single host, while forcing SM and then TP to no
> avail.
>
> Can you try restricting the collective modules in use (adding --mca coll
> tuned,basic) to your mpirun command?
>
>   George.
>
>
> On Fri, Feb 20, 2015 at 9:31 PM, Sachin Krishnan 
> wrote:
>
>> Josh,
>>
>> Thanks for the help.
>> I'm running on a single host. How do I confirm that it is an issue with
>> the shared memory?
>>
>> Sachin
>>
>> On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd 
>> wrote:
>>
>>> Sachin,
>>>
>>> Are you running this on a single host or across multiple hosts (i.e. are
>>> you communicating between processes via networking.) If it's on a single
>>> host, then it might be an issue with shared memory.
>>>
>>> Josh
>>>
>>> On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan 
>>> wrote:
>>>
 Hello Josh,

 The command i use to compile the code is:

 mpicc bcast_loop.c


 To run the code I use:

 mpirun -np 2 ./a.out

 Output is unpredictable. It gets stuck at different places.

 Im attaching lstopo and ompi_info outputs. Do you need any other info?


 lstopo-no-graphics output:

 Machine (3433MB)

   Socket L#0 + L3 L#0 (8192KB)

 L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0

   PU L#0 (P#0)

   PU L#1 (P#4)

 L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1

   PU L#2 (P#1)

   PU L#3 (P#5)

 L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2

   PU L#4 (P#2)

   PU L#5 (P#6)

 L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3

   PU L#6 (P#3)

   PU L#7 (P#7)

   HostBridge L#0

 PCI 8086:0162

   GPU L#0 "card0"

   GPU L#1 "renderD128"

   GPU L#2 "controlD64"

 PCI 8086:1502

   Net L#3 "eth0"

 PCI 8086:1e02

   Block L#4 "sda"

   Block L#5 "sr0"


 ompi_info output:


  Package: Open MPI builduser@anatol Distribution

 Open MPI: 1.8.4

   Open MPI repo revision: v1.8.3-330-g0344f04

Open MPI release date: Dec 19, 2014

 Open RTE: 1.8.4

   Open RTE repo revision: v1.8.3-330-g0344f04

Open RTE release date: Dec 19, 2014

 OPAL: 1.8.4

   OPAL repo revision: v1.8.3-330-g0344f04

OPAL release date: Dec 19, 2014

  MPI API: 3.0

 Ident string: 1.8.4

   Prefix: /usr

  Configured architecture: i686-pc-linux-gnu

   Configure host: anatol

Configured by: builduser

Configured on: Sat Dec 20 17:00:34 PST 2014

   Configure host: anatol

 Built by: builduser

 Built on: Sat Dec 20 17:12:16 PST 2014

   Built host: anatol

   C bindings: yes

 C++ bindings: yes

  Fort mpif.h: yes (all)

 Fort use mpi: yes (full: ignore TKR)

Fort use mpi size: deprecated-ompi-info-value

 Fort use mpi_f08: yes

  Fort mpi_f08 compliance: The mpi_f08 module is available, but due to

   limitations in the /usr/bin/gfortran
 compiler, does

   not support the following: array subsections,

   direct passthru (where possible) to
 underlying Open

   MPI's C functionality

   Fort mpi_f08 subarrays: no

Java bindings: no

   Wrapper compiler rpath: runpath

   C compiler: gcc

  C compiler absolute: /usr/bin/gcc

   C compiler family name: GNU

   C compiler version: 4.9.2

 C++ compiler: g++

C++ compiler absolute: /usr/bin/g++

Fort compiler: /usr/bin/gfortran

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-22 Thread George Bosilca

Sachin,

I cant replicate your issue neither with the latest 1.8 nor with the trunk.
I tried using a single host, while forcing SM and then TP to no avail.

Can you try restricting the collective modules in use (adding --mca coll
tuned,basic) to your mpirun command?

  George.


On Fri, Feb 20, 2015 at 9:31 PM, Sachin Krishnan  wrote:

> Josh,
>
> Thanks for the help.
> I'm running on a single host. How do I confirm that it is an issue with
> the shared memory?
>
> Sachin
>
> On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd 
> wrote:
>
>> Sachin,
>>
>> Are you running this on a single host or across multiple hosts (i.e. are
>> you communicating between processes via networking.) If it's on a single
>> host, then it might be an issue with shared memory.
>>
>> Josh
>>
>> On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan 
>> wrote:
>>
>>> Hello Josh,
>>>
>>> The command i use to compile the code is:
>>>
>>> mpicc bcast_loop.c
>>>
>>>
>>> To run the code I use:
>>>
>>> mpirun -np 2 ./a.out
>>>
>>> Output is unpredictable. It gets stuck at different places.
>>>
>>> Im attaching lstopo and ompi_info outputs. Do you need any other info?
>>>
>>>
>>> lstopo-no-graphics output:
>>>
>>> Machine (3433MB)
>>>
>>>   Socket L#0 + L3 L#0 (8192KB)
>>>
>>> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
>>>
>>>   PU L#0 (P#0)
>>>
>>>   PU L#1 (P#4)
>>>
>>> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
>>>
>>>   PU L#2 (P#1)
>>>
>>>   PU L#3 (P#5)
>>>
>>> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
>>>
>>>   PU L#4 (P#2)
>>>
>>>   PU L#5 (P#6)
>>>
>>> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
>>>
>>>   PU L#6 (P#3)
>>>
>>>   PU L#7 (P#7)
>>>
>>>   HostBridge L#0
>>>
>>> PCI 8086:0162
>>>
>>>   GPU L#0 "card0"
>>>
>>>   GPU L#1 "renderD128"
>>>
>>>   GPU L#2 "controlD64"
>>>
>>> PCI 8086:1502
>>>
>>>   Net L#3 "eth0"
>>>
>>> PCI 8086:1e02
>>>
>>>   Block L#4 "sda"
>>>
>>>   Block L#5 "sr0"
>>>
>>>
>>> ompi_info output:
>>>
>>>
>>>  Package: Open MPI builduser@anatol Distribution
>>>
>>> Open MPI: 1.8.4
>>>
>>>   Open MPI repo revision: v1.8.3-330-g0344f04
>>>
>>>Open MPI release date: Dec 19, 2014
>>>
>>> Open RTE: 1.8.4
>>>
>>>   Open RTE repo revision: v1.8.3-330-g0344f04
>>>
>>>Open RTE release date: Dec 19, 2014
>>>
>>> OPAL: 1.8.4
>>>
>>>   OPAL repo revision: v1.8.3-330-g0344f04
>>>
>>>OPAL release date: Dec 19, 2014
>>>
>>>  MPI API: 3.0
>>>
>>> Ident string: 1.8.4
>>>
>>>   Prefix: /usr
>>>
>>>  Configured architecture: i686-pc-linux-gnu
>>>
>>>   Configure host: anatol
>>>
>>>Configured by: builduser
>>>
>>>Configured on: Sat Dec 20 17:00:34 PST 2014
>>>
>>>   Configure host: anatol
>>>
>>> Built by: builduser
>>>
>>> Built on: Sat Dec 20 17:12:16 PST 2014
>>>
>>>   Built host: anatol
>>>
>>>   C bindings: yes
>>>
>>> C++ bindings: yes
>>>
>>>  Fort mpif.h: yes (all)
>>>
>>> Fort use mpi: yes (full: ignore TKR)
>>>
>>>Fort use mpi size: deprecated-ompi-info-value
>>>
>>> Fort use mpi_f08: yes
>>>
>>>  Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
>>>
>>>   limitations in the /usr/bin/gfortran compiler,
>>> does
>>>
>>>   not support the following: array subsections,
>>>
>>>   direct passthru (where possible) to underlying
>>> Open
>>>
>>>   MPI's C functionality
>>>
>>>   Fort mpi_f08 subarrays: no
>>>
>>>Java bindings: no
>>>
>>>   Wrapper compiler rpath: runpath
>>>
>>>   C compiler: gcc
>>>
>>>  C compiler absolute: /usr/bin/gcc
>>>
>>>   C compiler family name: GNU
>>>
>>>   C compiler version: 4.9.2
>>>
>>> C++ compiler: g++
>>>
>>>C++ compiler absolute: /usr/bin/g++
>>>
>>>Fort compiler: /usr/bin/gfortran
>>>
>>>Fort compiler abs:
>>>
>>>  Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
>>>
>>>Fort 08 assumed shape: yes
>>>
>>>   Fort optional args: yes
>>>
>>>   Fort INTERFACE: yes
>>>
>>> Fort ISO_FORTRAN_ENV: yes
>>>
>>>Fort STORAGE_SIZE: yes
>>>
>>>   Fort BIND(C) (all): yes
>>>
>>>   Fort ISO_C_BINDING: yes
>>>
>>>  Fort SUBROUTINE BIND(C): yes
>>>
>>>Fort TYPE,BIND(C): yes
>>>
>>>  Fort T,BIND(C,name="a"): yes
>>>
>>> Fort PRIVATE: yes
>>>
>>>   Fort PROTECTED: yes
>>>
>>>Fort ABSTRACT: yes
>>>
>>>Fort ASYNCHRONOUS: yes
>>>
>>>   Fort PROCEDURE: yes
>>>
>>>Fort C_FUNLOC: yes
>>>
>>>  Fort f08 using wrappers: yes
>>>
>>>

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-20 Thread Sachin Krishnan

Josh,

Thanks for the help.
I'm running on a single host. How do I confirm that it is an issue with the
shared memory?

Sachin

On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd  wrote:

> Sachin,
>
> Are you running this on a single host or across multiple hosts (i.e. are
> you communicating between processes via networking.) If it's on a single
> host, then it might be an issue with shared memory.
>
> Josh
>
> On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan 
> wrote:
>
>> Hello Josh,
>>
>> The command i use to compile the code is:
>>
>> mpicc bcast_loop.c
>>
>>
>> To run the code I use:
>>
>> mpirun -np 2 ./a.out
>>
>> Output is unpredictable. It gets stuck at different places.
>>
>> Im attaching lstopo and ompi_info outputs. Do you need any other info?
>>
>>
>> lstopo-no-graphics output:
>>
>> Machine (3433MB)
>>
>>   Socket L#0 + L3 L#0 (8192KB)
>>
>> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
>>
>>   PU L#0 (P#0)
>>
>>   PU L#1 (P#4)
>>
>> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
>>
>>   PU L#2 (P#1)
>>
>>   PU L#3 (P#5)
>>
>> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
>>
>>   PU L#4 (P#2)
>>
>>   PU L#5 (P#6)
>>
>> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
>>
>>   PU L#6 (P#3)
>>
>>   PU L#7 (P#7)
>>
>>   HostBridge L#0
>>
>> PCI 8086:0162
>>
>>   GPU L#0 "card0"
>>
>>   GPU L#1 "renderD128"
>>
>>   GPU L#2 "controlD64"
>>
>> PCI 8086:1502
>>
>>   Net L#3 "eth0"
>>
>> PCI 8086:1e02
>>
>>   Block L#4 "sda"
>>
>>   Block L#5 "sr0"
>>
>>
>> ompi_info output:
>>
>>
>>  Package: Open MPI builduser@anatol Distribution
>>
>> Open MPI: 1.8.4
>>
>>   Open MPI repo revision: v1.8.3-330-g0344f04
>>
>>Open MPI release date: Dec 19, 2014
>>
>> Open RTE: 1.8.4
>>
>>   Open RTE repo revision: v1.8.3-330-g0344f04
>>
>>Open RTE release date: Dec 19, 2014
>>
>> OPAL: 1.8.4
>>
>>   OPAL repo revision: v1.8.3-330-g0344f04
>>
>>OPAL release date: Dec 19, 2014
>>
>>  MPI API: 3.0
>>
>> Ident string: 1.8.4
>>
>>   Prefix: /usr
>>
>>  Configured architecture: i686-pc-linux-gnu
>>
>>   Configure host: anatol
>>
>>Configured by: builduser
>>
>>Configured on: Sat Dec 20 17:00:34 PST 2014
>>
>>   Configure host: anatol
>>
>> Built by: builduser
>>
>> Built on: Sat Dec 20 17:12:16 PST 2014
>>
>>   Built host: anatol
>>
>>   C bindings: yes
>>
>> C++ bindings: yes
>>
>>  Fort mpif.h: yes (all)
>>
>> Fort use mpi: yes (full: ignore TKR)
>>
>>Fort use mpi size: deprecated-ompi-info-value
>>
>> Fort use mpi_f08: yes
>>
>>  Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
>>
>>   limitations in the /usr/bin/gfortran compiler,
>> does
>>
>>   not support the following: array subsections,
>>
>>   direct passthru (where possible) to underlying
>> Open
>>
>>   MPI's C functionality
>>
>>   Fort mpi_f08 subarrays: no
>>
>>Java bindings: no
>>
>>   Wrapper compiler rpath: runpath
>>
>>   C compiler: gcc
>>
>>  C compiler absolute: /usr/bin/gcc
>>
>>   C compiler family name: GNU
>>
>>   C compiler version: 4.9.2
>>
>> C++ compiler: g++
>>
>>C++ compiler absolute: /usr/bin/g++
>>
>>Fort compiler: /usr/bin/gfortran
>>
>>Fort compiler abs:
>>
>>  Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
>>
>>Fort 08 assumed shape: yes
>>
>>   Fort optional args: yes
>>
>>   Fort INTERFACE: yes
>>
>> Fort ISO_FORTRAN_ENV: yes
>>
>>Fort STORAGE_SIZE: yes
>>
>>   Fort BIND(C) (all): yes
>>
>>   Fort ISO_C_BINDING: yes
>>
>>  Fort SUBROUTINE BIND(C): yes
>>
>>Fort TYPE,BIND(C): yes
>>
>>  Fort T,BIND(C,name="a"): yes
>>
>> Fort PRIVATE: yes
>>
>>   Fort PROTECTED: yes
>>
>>Fort ABSTRACT: yes
>>
>>Fort ASYNCHRONOUS: yes
>>
>>   Fort PROCEDURE: yes
>>
>>Fort C_FUNLOC: yes
>>
>>  Fort f08 using wrappers: yes
>>
>>  Fort MPI_SIZEOF: yes
>>
>>  C profiling: yes
>>
>>C++ profiling: yes
>>
>>Fort mpif.h profiling: yes
>>
>>   Fort use mpi profiling: yes
>>
>>Fort use mpi_f08 prof: yes
>>
>>   C++ exceptions: no
>>
>>   Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support:
>> yes,
>>
>>   OMPI progress: no, ORTE progress: yes, Event
>> lib:
>>
>>   yes)
>>
>>Sparse Groups: no
>>
>>   Internal debug support: no
>>
>>   MPI interface warnings: yes
>>
>>  MPI parameter check:

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-20 Thread Joshua Ladd

Sachin,

Are you running this on a single host or across multiple hosts (i.e. are
you communicating between processes via networking.) If it's on a single
host, then it might be an issue with shared memory.

Josh

On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan  wrote:

> Hello Josh,
>
> The command i use to compile the code is:
>
> mpicc bcast_loop.c
>
>
> To run the code I use:
>
> mpirun -np 2 ./a.out
>
> Output is unpredictable. It gets stuck at different places.
>
> Im attaching lstopo and ompi_info outputs. Do you need any other info?
>
>
> lstopo-no-graphics output:
>
> Machine (3433MB)
>
>   Socket L#0 + L3 L#0 (8192KB)
>
> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
>
>   PU L#0 (P#0)
>
>   PU L#1 (P#4)
>
> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
>
>   PU L#2 (P#1)
>
>   PU L#3 (P#5)
>
> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
>
>   PU L#4 (P#2)
>
>   PU L#5 (P#6)
>
> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
>
>   PU L#6 (P#3)
>
>   PU L#7 (P#7)
>
>   HostBridge L#0
>
> PCI 8086:0162
>
>   GPU L#0 "card0"
>
>   GPU L#1 "renderD128"
>
>   GPU L#2 "controlD64"
>
> PCI 8086:1502
>
>   Net L#3 "eth0"
>
> PCI 8086:1e02
>
>   Block L#4 "sda"
>
>   Block L#5 "sr0"
>
>
> ompi_info output:
>
>
>  Package: Open MPI builduser@anatol Distribution
>
> Open MPI: 1.8.4
>
>   Open MPI repo revision: v1.8.3-330-g0344f04
>
>Open MPI release date: Dec 19, 2014
>
> Open RTE: 1.8.4
>
>   Open RTE repo revision: v1.8.3-330-g0344f04
>
>Open RTE release date: Dec 19, 2014
>
> OPAL: 1.8.4
>
>   OPAL repo revision: v1.8.3-330-g0344f04
>
>OPAL release date: Dec 19, 2014
>
>  MPI API: 3.0
>
> Ident string: 1.8.4
>
>   Prefix: /usr
>
>  Configured architecture: i686-pc-linux-gnu
>
>   Configure host: anatol
>
>Configured by: builduser
>
>Configured on: Sat Dec 20 17:00:34 PST 2014
>
>   Configure host: anatol
>
> Built by: builduser
>
> Built on: Sat Dec 20 17:12:16 PST 2014
>
>   Built host: anatol
>
>   C bindings: yes
>
> C++ bindings: yes
>
>  Fort mpif.h: yes (all)
>
> Fort use mpi: yes (full: ignore TKR)
>
>Fort use mpi size: deprecated-ompi-info-value
>
> Fort use mpi_f08: yes
>
>  Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
>
>   limitations in the /usr/bin/gfortran compiler,
> does
>
>   not support the following: array subsections,
>
>   direct passthru (where possible) to underlying
> Open
>
>   MPI's C functionality
>
>   Fort mpi_f08 subarrays: no
>
>Java bindings: no
>
>   Wrapper compiler rpath: runpath
>
>   C compiler: gcc
>
>  C compiler absolute: /usr/bin/gcc
>
>   C compiler family name: GNU
>
>   C compiler version: 4.9.2
>
> C++ compiler: g++
>
>C++ compiler absolute: /usr/bin/g++
>
>Fort compiler: /usr/bin/gfortran
>
>Fort compiler abs:
>
>  Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
>
>Fort 08 assumed shape: yes
>
>   Fort optional args: yes
>
>   Fort INTERFACE: yes
>
> Fort ISO_FORTRAN_ENV: yes
>
>Fort STORAGE_SIZE: yes
>
>   Fort BIND(C) (all): yes
>
>   Fort ISO_C_BINDING: yes
>
>  Fort SUBROUTINE BIND(C): yes
>
>Fort TYPE,BIND(C): yes
>
>  Fort T,BIND(C,name="a"): yes
>
> Fort PRIVATE: yes
>
>   Fort PROTECTED: yes
>
>Fort ABSTRACT: yes
>
>Fort ASYNCHRONOUS: yes
>
>   Fort PROCEDURE: yes
>
>Fort C_FUNLOC: yes
>
>  Fort f08 using wrappers: yes
>
>  Fort MPI_SIZEOF: yes
>
>  C profiling: yes
>
>C++ profiling: yes
>
>Fort mpif.h profiling: yes
>
>   Fort use mpi profiling: yes
>
>Fort use mpi_f08 prof: yes
>
>   C++ exceptions: no
>
>   Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support:
> yes,
>
>   OMPI progress: no, ORTE progress: yes, Event lib:
>
>   yes)
>
>Sparse Groups: no
>
>   Internal debug support: no
>
>   MPI interface warnings: yes
>
>  MPI parameter check: runtime
>
> Memory profiling support: no
>
> Memory debugging support: no
>
>  libltdl support: yes
>
>Heterogeneous support: no
>
>  mpirun default --prefix: no
>
>  MPI I/O support: yes
>
>MPI_WTIME support: gettimeofday
>
>  Symbol vis. support: yes
>
>Host topology support: yes
>
>   MPI extensions:
>
>FT Checkpoint support: no (checkpoint thread: no)
>
>C/R Enabled Debugging: no
>
>

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-20 Thread Sachin Krishnan

Hello Josh,

The command i use to compile the code is:

mpicc bcast_loop.c


To run the code I use:

mpirun -np 2 ./a.out

Output is unpredictable. It gets stuck at different places.

Im attaching lstopo and ompi_info outputs. Do you need any other info?


lstopo-no-graphics output:

Machine (3433MB)

  Socket L#0 + L3 L#0 (8192KB)

L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0

  PU L#0 (P#0)

  PU L#1 (P#4)

L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1

  PU L#2 (P#1)

  PU L#3 (P#5)

L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2

  PU L#4 (P#2)

  PU L#5 (P#6)

L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3

  PU L#6 (P#3)

  PU L#7 (P#7)

  HostBridge L#0

PCI 8086:0162

  GPU L#0 "card0"

  GPU L#1 "renderD128"

  GPU L#2 "controlD64"

PCI 8086:1502

  Net L#3 "eth0"

PCI 8086:1e02

  Block L#4 "sda"

  Block L#5 "sr0"


ompi_info output:


 Package: Open MPI builduser@anatol Distribution

Open MPI: 1.8.4

  Open MPI repo revision: v1.8.3-330-g0344f04

   Open MPI release date: Dec 19, 2014

Open RTE: 1.8.4

  Open RTE repo revision: v1.8.3-330-g0344f04

   Open RTE release date: Dec 19, 2014

OPAL: 1.8.4

  OPAL repo revision: v1.8.3-330-g0344f04

   OPAL release date: Dec 19, 2014

 MPI API: 3.0

Ident string: 1.8.4

  Prefix: /usr

 Configured architecture: i686-pc-linux-gnu

  Configure host: anatol

   Configured by: builduser

   Configured on: Sat Dec 20 17:00:34 PST 2014

  Configure host: anatol

Built by: builduser

Built on: Sat Dec 20 17:12:16 PST 2014

  Built host: anatol

  C bindings: yes

C++ bindings: yes

 Fort mpif.h: yes (all)

Fort use mpi: yes (full: ignore TKR)

   Fort use mpi size: deprecated-ompi-info-value

Fort use mpi_f08: yes

 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to

  limitations in the /usr/bin/gfortran compiler,
does

  not support the following: array subsections,

  direct passthru (where possible) to underlying
Open

  MPI's C functionality

  Fort mpi_f08 subarrays: no

   Java bindings: no

  Wrapper compiler rpath: runpath

  C compiler: gcc

 C compiler absolute: /usr/bin/gcc

  C compiler family name: GNU

  C compiler version: 4.9.2

C++ compiler: g++

   C++ compiler absolute: /usr/bin/g++

   Fort compiler: /usr/bin/gfortran

   Fort compiler abs:

 Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)

   Fort 08 assumed shape: yes

  Fort optional args: yes

  Fort INTERFACE: yes

Fort ISO_FORTRAN_ENV: yes

   Fort STORAGE_SIZE: yes

  Fort BIND(C) (all): yes

  Fort ISO_C_BINDING: yes

 Fort SUBROUTINE BIND(C): yes

   Fort TYPE,BIND(C): yes

 Fort T,BIND(C,name="a"): yes

Fort PRIVATE: yes

  Fort PROTECTED: yes

   Fort ABSTRACT: yes

   Fort ASYNCHRONOUS: yes

  Fort PROCEDURE: yes

   Fort C_FUNLOC: yes

 Fort f08 using wrappers: yes

 Fort MPI_SIZEOF: yes

 C profiling: yes

   C++ profiling: yes

   Fort mpif.h profiling: yes

  Fort use mpi profiling: yes

   Fort use mpi_f08 prof: yes

  C++ exceptions: no

  Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes,

  OMPI progress: no, ORTE progress: yes, Event lib:

  yes)

   Sparse Groups: no

  Internal debug support: no

  MPI interface warnings: yes

 MPI parameter check: runtime

Memory profiling support: no

Memory debugging support: no

 libltdl support: yes

   Heterogeneous support: no

 mpirun default --prefix: no

 MPI I/O support: yes

   MPI_WTIME support: gettimeofday

 Symbol vis. support: yes

   Host topology support: yes

  MPI extensions:

   FT Checkpoint support: no (checkpoint thread: no)

   C/R Enabled Debugging: no

 VampirTrace support: yes

  MPI_MAX_PROCESSOR_NAME: 256

MPI_MAX_ERROR_STRING: 256

 MPI_MAX_OBJECT_NAME: 64

MPI_MAX_INFO_KEY: 36

MPI_MAX_INFO_VAL: 256

   MPI_MAX_PORT_NAME: 1024

  MPI_MAX_DATAREP_STRING: 128

   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.8.4)

MCA compress: bzip (MCA v2.0, API v2.0, Component v1.8.4)

MCA compress: gzip (MCA v2.0, API v2.0, Component v1.8.4)

 MCA crs: none (MCA v2.0, API v2.0, Component v1.8.4)

  MCA db: hash (MCA v2.0, API v1.0, Component v1.8.4)

  MCA db: print (MCA v2.0, API v1.0, Component

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

2015-02-18 Thread Joshua Ladd

Sachin,

Can you, please, provide a command line? Additional information about your
system could be helpful also.

Josh

On Wed, Feb 18, 2015 at 3:43 AM, Sachin Krishnan  wrote:

> Hello,
>
> I am new to MPI and also this list.
> I wrote an MPI code with several MPI_Bcast calls in a loop. My code was
> getting stuck at random points, ie it was not systematic. After a few hours
> of debugging and googling, I found that the issue may be with the several
> MPI_Bcast calls in a loop.
>
> I stumbled on this test code which can reproduce the issue:
> https://github.com/fintler/ompi/blob/master/orte/test/mpi/bcast_loop.c
>
> Im using OpenMPI v1.8.4 installed from official Arch Linux repo.
>
> Is it a known issue with OpenMPI?
> Is it some problem with the way openmpi is configured in my system?
>
> Thanks in advance.
>
> Sachin
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26338.php
>

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

Re: [OMPI users] Several Bcast calls in a loop causing the code to hang

10 matches

Site Navigation

Mail list logo

Footer information