Re: [OMPI users] Checkpoint with blcr

2017-05-19 Thread Omar Andrés Zapata Mesa
Thanks Jeff

On Fri, May 19, 2017 at 4:59 PM, Jeff Squyres (jsquyres)  wrote:

> Open MPI v2.1.x does not support checkpoint restart; it was unmaintained
> and getting stale, so it was removed.
>
> Looks like we forgot to remove the cr MPI extension from the v2.1.x
> release series when we removed the rest of the checkpoint restart support.
> Sorry for the confusion.
>
>
>
> > On May 19, 2017, at 5:43 PM, Omar Andrés Zapata Mesa <
> andresete.ch...@gmail.com> wrote:
> >
> > Dear all,
> >
> > I am trying to compile ompi 2.1.0 with support for checkpoint using blcr.
> > openmpi-2.1.0# ./configure --enable-mpi-ext=cr
> > --- MPI Extension cr
> > configure: WARNING: Requested "cr" MPI cr, but cannot build it
> > configure: WARNING: because fault tolerance is not enabled.
> > configure: WARNING: Try again with --enable-ft
> >
> > then using --enable-ft
> > ./configure  --enable-ft=cr
> >
> > configure: WARNING: unrecognized options: --enable-ft
> >
> > How I can enable checkpoint with blcr?
> >
> >
> > I am under Gnu/Debian with gcc 6.3.0
> > Best
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Checkpoint with blcr

2017-05-19 Thread Jeff Squyres (jsquyres)
Open MPI v2.1.x does not support checkpoint restart; it was unmaintained and 
getting stale, so it was removed.

Looks like we forgot to remove the cr MPI extension from the v2.1.x release 
series when we removed the rest of the checkpoint restart support.  Sorry for 
the confusion.



> On May 19, 2017, at 5:43 PM, Omar Andrés Zapata Mesa 
>  wrote:
> 
> Dear all,
> 
> I am trying to compile ompi 2.1.0 with support for checkpoint using blcr.
> openmpi-2.1.0# ./configure --enable-mpi-ext=cr
> --- MPI Extension cr 
> configure: WARNING: Requested "cr" MPI cr, but cannot build it 
> configure: WARNING: because fault tolerance is not enabled. 
> configure: WARNING: Try again with --enable-ft
> 
> then using --enable-ft 
> ./configure  --enable-ft=cr
> 
> configure: WARNING: unrecognized options: --enable-ft
> 
> How I can enable checkpoint with blcr?
> 
> 
> I am under Gnu/Debian with gcc 6.3.0
> Best
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Checkpoint with blcr

2017-05-19 Thread Omar Andrés Zapata Mesa
Dear all,

I am trying to compile ompi 2.1.0 with support for checkpoint using blcr.
openmpi-2.1.0# ./configure --enable-mpi-ext=cr
--- MPI Extension cr
configure: WARNING: Requested "cr" MPI cr, but cannot build it
configure: WARNING: because fault tolerance is not enabled.
configure: WARNING: Try again with --enable-ft

then using --enable-ft
./configure  --enable-ft=cr

configure: WARNING: unrecognized options: --enable-ft

How I can enable checkpoint with blcr?


I am under Gnu/Debian with gcc 6.3.0
Best
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Allan Overstreet
Below are the results from the ibnetdiscover command This command was 
run from node smd.


#
# Topology file: generated on Fri May 19 15:59:47 2017
#
# Initiated from node 0002c903000a0a32 port 0002c903000a0a34

vendid=0x8f1
devid=0x5a5a
sysimgguid=0x8f105001094d3
switchguid=0x8f105001094d2(8f105001094d2)
Switch36 "S-0008f105001094d2"# "Voltaire 4036 # SWITCH-IB-1" 
enhanced port 0 lid 1 lmc 0
[1]"H-0002c903000a09c2"[1](2c903000a09c3) # "dl580 mlx4_0" 
lid 2 4xQDR
[2]"H-0011757986e4"[1](11757986e4) # "sm4 qib0" lid 
6 4xQDR
[3]"H-0011757990f6"[1](11757990f6) # "sm3 qib0" lid 
5 4xQDR
[4]"H-001175797a12"[1](1175797a12) # "sm2 qib0" lid 
4 4xDDR
[5]"H-001175797a68"[1](1175797a68) # "sm1 qib0" lid 
3 4xDDR
[36]"H-0002c903000a0a32"[2](2c903000a0a34) # "MT25408 
ConnectX Mellanox Technologies" lid 7 4xQDR


vendid=0x1175
devid=0x7322
sysimgguid=0x1175797a68
caguid=0x1175797a68
Ca1 "H-001175797a68"# "sm1 qib0"
[1](1175797a68) "S-0008f105001094d2"[5]# lid 3 lmc 0 
"Voltaire 4036 # SWITCH-IB-1" lid 1 4xDDR


vendid=0x1175
devid=0x7322
sysimgguid=0x1175797a12
caguid=0x1175797a12
Ca1 "H-001175797a12"# "sm2 qib0"
[1](1175797a12) "S-0008f105001094d2"[4]# lid 4 lmc 0 
"Voltaire 4036 # SWITCH-IB-1" lid 1 4xDDR


vendid=0x1175
devid=0x7322
sysimgguid=0x11757990f6
caguid=0x11757990f6
Ca1 "H-0011757990f6"# "sm3 qib0"
[1](11757990f6) "S-0008f105001094d2"[3]# lid 5 lmc 0 
"Voltaire 4036 # SWITCH-IB-1" lid 1 4xQDR


vendid=0x1175
devid=0x7322
sysimgguid=0x11757986e4
caguid=0x11757986e4
Ca1 "H-0011757986e4"# "sm4 qib0"
[1](11757986e4) "S-0008f105001094d2"[2]# lid 6 lmc 0 
"Voltaire 4036 # SWITCH-IB-1" lid 1 4xQDR


vendid=0x2c9
devid=0x673c
sysimgguid=0x2c903000a09c5
caguid=0x2c903000a09c2
Ca2 "H-0002c903000a09c2"# "dl580 mlx4_0"
[1](2c903000a09c3) "S-0008f105001094d2"[1]# lid 2 lmc 0 
"Voltaire 4036 # SWITCH-IB-1" lid 1 4xQDR


vendid=0x2c9
devid=0x673c
sysimgguid=0x2c903000a0a35
caguid=0x2c903000a0a32
Ca2 "H-0002c903000a0a32"# "MT25408 ConnectX Mellanox 
Technologies"
[2](2c903000a0a34) "S-0008f105001094d2"[36]# lid 7 lmc 0 
"Voltaire 4036 # SWITCH-IB-1" lid 1 4xQDR



On 05/19/2017 03:26 AM, John Hearns via users wrote:

Allan,
remember that Infiniband is not Ethernet.  You dont NEED to set up 
IPOIB interfaces.


Two diagnostics please for you to run:

ibnetdiscover

ibdiagnet


Let us please have the reuslts ofibnetdiscover




On 19 May 2017 at 09:25, John Hearns > wrote:


Giles, Allan,

if the host 'smd' is acting as a cluster head node it is not a
must for it to have an Infiniband card.
So you should be able to run jobs across the other nodes, which
have Qlogic cards.
I may have something mixed up here, if so I am sorry.

If you want also to run jobs on the smd host, you should take note
of what Giles says.
You may be out of luck in that case.

On 19 May 2017 at 09:15, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote:

Allan,


i just noted smd has a Mellanox card, while other nodes have
QLogic cards.

mtl/psm works best for QLogic while btl/openib (or mtl/mxm)
work best for Mellanox,

but these are not interoperable. also, i do not think
btl/openib can be used with QLogic cards

(please someone correct me if i am wrong)


from the logs, i can see that smd (Mellanox) is not even able
to use the infiniband port.

if you run with 2 MPI tasks, both run on smd and hence
btl/vader is used, that is why it works

if you run with more than 2 MPI tasks, then smd and other
nodes are used, and every MPI task fall back to btl/tcp

for inter node communication.


[smd][[41971,1],1][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.1.196 failed: No route to host (113)

this usually indicates a firewall, but since both ssh and
oob/tcp are fine, this puzzles me.


what if you

mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include
192.168.1.0/24  --mca
btl_tcp_if_include 192.168.1.0/24 
--mca pml ob1 --mca btl tcp,sm,vader,self  ring

that should work with no error messages, and then you can try
with 12 MPI tasks

(note internode MPI communications will use tcp only)


if you want optimal performance, i am afraid you cannot run
any MPI task on smd (so mtl/psm can be used )

(btw, make sure PSM support was built in Open MPI)

a suboptimal option is to force MPI communications on IPoIB with

/* make sure all no

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Elken, Tom
" i do not think btl/openib can be used with QLogic cards
(please someone correct me if i am wrong)"

You are wrong :) .  The openib BTL is the best one to use for interoperability 
between QLogic and Mellanox IB cards.
The Intel True Scale (the continuation of the QLogic IB product line)  Host SW 
User Guide 
http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserGuide_G91902_06.pdf
 
says, (I paraphrase):

To run over IB verbs ... for example:

$ mpirun -np 4 -hostfile mpihosts --mca btl self,sm,openib --mca mtl ^psm 
./mpi_app_name


But, as some have suggested, you may make your life simpler and get ~ the same 
or better performance (depending on the workload) if you use the Mlx node as a 
head node and run the job on the 5 QLogic HCA nodes using mtl psm.

-Tom

-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles 
Gouaillardet
Sent: Friday, May 19, 2017 12:16 AM
To: Open MPI Users 
Subject: Re: [OMPI users] Many different errors with ompi version 2.1.1

Allan,


i just noted smd has a Mellanox card, while other nodes have QLogic cards.

mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best for 
Mellanox,

but these are not interoperable. also, i do not think btl/openib can be used 
with QLogic cards

(please someone correct me if i am wrong)


from the logs, i can see that smd (Mellanox) is not even able to use the 
infiniband port.

if you run with 2 MPI tasks, both run on smd and hence btl/vader is used, that 
is why it works

if you run with more than 2 MPI tasks, then smd and other nodes are used, and 
every MPI task fall back to btl/tcp

for inter node communication.

[smd][[41971,1],1][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.1.196 failed: No route to host (113)

this usually indicates a firewall, but since both ssh and oob/tcp are fine, 
this puzzles me.


what if you

mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca btl_tcp_if_include 192.168.1.0/24 --mca pml ob1 --mca btl 
tcp,sm,vader,self  ring

that should work with no error messages, and then you can try with 12 
MPI tasks

(note internode MPI communications will use tcp only)


if you want optimal performance, i am afraid you cannot run any MPI task 
on smd (so mtl/psm can be used )

(btw, make sure PSM support was built in Open MPI)

a suboptimal option is to force MPI communications on IPoIB with

/* make sure all nodes can ping each other via IPoIB first */

mpirun --mca oob_tcp_if_include 192.168.1.0/24 --mca btl_tcp_if_include 
10.1.0.0/24 --mca pml ob1 --mca btl tcp,sm,vader,self



Cheers,


Gilles


On 5/19/2017 3:50 PM, Allan Overstreet wrote:
> Gilles,
>
> On which node is mpirun invoked ?
>
> The mpirun command was involed on node smd.
>
> Are you running from a batch manager?
>
> No.
>
> Is there any firewall running on your nodes ?
>
> No CentOS minimal does not have a firewall installed and Ubuntu 
> Mate's firewall is disabled.
>
> All three of your commands have appeared to run successfully. The 
> outputs of the three commands are attached.
>
> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
> --mca oob_base_verbose 100 true &> cmd1
>
> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
> --mca oob_base_verbose 100 true &> cmd2
>
> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
> --mca oob_base_verbose 100 ring &> cmd3
>
> If I increase the number of processors in the ring program, mpirun 
> will not succeed.
>
> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
> --mca oob_base_verbose 100 ring &> cmd4
>
>
> On 05/19/2017 02:18 AM, Gilles Gouaillardet wrote:
>> Allan,
>>
>>
>> - on which node is mpirun invoked ?
>>
>> - are you running from a batch manager ?
>>
>> - is there any firewall running on your nodes ?
>>
>>
>> the error is likely occuring when wiring-up mpirun/orted
>>
>> what if you
>>
>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
>> --mca oob_base_verbose 100 true
>>
>> then (if the previous command worked)
>>
>> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 
>> 192.168.1.0/24 --mca oob_base_verbose 100 true
>>
>> and finally (if both previous commands worked)
>>
>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
>> --mca oob_base_verbose 100 ring
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On 5/19/2017 3:07 PM, Allan Overstreet wrote:
>>> I experiencing many different errors with openmpi version 2.1.1. I 
>>> have had a suspicion that this might be related to the way the 
>>> servers were connected and configured. Regardless below is a diagram 
>>> of how the server are configured.
>>>
>>> __  _
>>>[__]|=|
>>>/::/|_|
>>>HO

Re: [OMPI users] MPI the correct solution?

2017-05-19 Thread Reuti
As I think it's not relevant to Open MPI itself, I answered in PM only.

-- Reuti


> Am 18.05.2017 um 18:55 schrieb do...@mail.com:
> 
> On Tue, 9 May 2017 00:30:38 +0200
> Reuti  wrote:
>> Hi,
>> 
>> Am 08.05.2017 um 23:25 schrieb David Niklas:
>> 
>>> Hello,
>>> I originally ported this question at LQ, but the answer I got back
>>> shows rather poor insight on the subject of MPI, so I'm taking the
>>> liberty of posting here also.
>>> 
>>> https://www.linuxquestions.org/questions/showthread.php?p=5707962
>>> 
>>> What I'm trying to do is figure out how/what to use to update an osm
>>> file (open street map), in a cross system manner. I know the correct
>>> program osmosis and for de/re-compression lbzip2 but how to do this
>>> across computers is confusing me, even after a few hours of searching
>>> online.  
>> 
>> lbzip2 is only thread parallel on a single machine. With pbzip2 you
>> mention it's the same, but it exists an MPI version MPIBZIP2 -
> I can't find the project, do you have a link?
> 
>> unfortunately it looks unmaintained since 2007. Maybe you can contact
>> the author about its state. Without an MPI application like this, the
>> MPI library is nothing on its own which would divide and distribute one
>> task to several machines automatically.
> Well, there might be other ways to cause a program to run on multiple
> computers. Perhaps a virtual machine made of of multiple physical
> machines?
> 
>> osmosis itself seems to run in serial only (they don't say any word
>> whether it uses any parallelism).
> Yes, it does run multiple threads, you just start another task (and add a
> buffer). I tested this on my machine, I think it is --read-xml
> --write-xml and --read-xml-change that start new threads. The question is
> whether or not java is naively MPI aware or does the app need special
> coding?
> 
>> For the intended task the only option is to use a single machine with
>> as many cores as possible AFAICS.
> Though about that, and it is doable with respect to memory and disk
> constraints, the problem is that it would take a *long* time esp. with the
> amount of updates I must do, hence my inquiry.
> 
> Thanks,
> David
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Yes, using " -pami_noib" solve the problem, I lost the previous message.
Thanks you so much for the support.

2017-05-19 11:12 GMT+02:00 John Hearns via users :

> I am not sure I agree with that.
> (a) the original error message from Gabriele was quite clear - the MPI
> could not find an interface card which was up, so it would not run.
> (b) Nysal actually pointed out the solution which looks good - after
> reaidng the documentation.. use  pami_noib
> (c) Having discussions like this helps us all to learn. I have made many
> stupid replies on this list, and looking at problems like this has helped
> me to learn.
>
>
>
>
> On 19 May 2017 at 11:01, r...@open-mpi.org  wrote:
>
>> If I might interject here before lots of time is wasted. Spectrum MPI is
>> an IBM -product- and is not free. What you are likely running into is that
>> their license manager is blocking you from running, albeit without a really
>> nice error message. I’m sure that’s something they are working on.
>>
>> If you really want to use Spectrum MPI, I suggest you contact them about
>> purchasing it.
>>
>>
>> On May 19, 2017, at 1:16 AM, Gabriele Fatigati 
>> wrote:
>>
>> Hi Gilles, in attach the outpuf of:
>>
>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>
>> 2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet :
>>
>>> Gabriele,
>>>
>>>
>>> can you
>>>
>>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>>
>>>
>>> so we can figure out why nor sm nor vader is used ?
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>>
>>> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:
>>>
 Oh no, by using two procs:


 findActiveDevices Error
 We found no active IB device ports
 findActiveDevices Error
 We found no active IB device ports
 
 --
 At least one pair of MPI processes are unable to reach each other for
 MPI communications.  This means that no Open MPI device has indicated
 that it can be used to communicate between these processes.  This is
 an error; Open MPI requires that all MPI processes be able to reach
 each other.  This error can sometimes be the result of forgetting to
 specify the "self" BTL.

   Process 1 ([[12380,1],0]) is on host: openpower
   Process 2 ([[12380,1],1]) is on host: openpower
   BTLs attempted: self

 Your MPI job is now going to abort; sorry.
 
 --
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***and potentially your MPI job)
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***and potentially your MPI job)
 
 --
 MPI_INIT has failed because at least one MPI process is unreachable
 from another.  This *usually* means that an underlying communication
 plugin -- such as a BTL or an MTL -- has either not loaded or not
 allowed itself to be used.  Your MPI job will now abort.

 You may wish to try to narrow down the problem;
  * Check the output of ompi_info to see which BTL/MTL plugins are
available.
  * Run your application with MPI_THREAD_SINGLE.
  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
 
 --
 [openpower:88867] 1 more process has sent help message
 help-mca-bml-r2.txt / unreachable proc
 [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to
 see all help / error messages
 [openpower:88867] 1 more process has sent help message
 help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail





 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati >>> >:

 Hi GIlles,

 using your command with one MPI procs I get:

 findActiveDevices Error
 We found no active IB device ports
 Hello world from rank 0  out of 1 processors

 So it seems to work apart the error message.


 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet >>> >:

 Gabriele,


 so it seems pml/pami assumes there is an infiniband card
 available (!)

 i guess IBM folks will comment on that shortly.


 meanwhile, you do not need pami since you are running on a
 single node

 mpirun --mca pml ^pami ...

 should do the trick

 (if it does not 

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
ps. One take away for everyone working with MPI.
Turn up the error logging or debug level.
then PAY ATTENTION to the error messages.

I have spent a LOT of my time doing just that - with OpenMPI and with Intel
MPI over Omnipath and other interconnects in the dim and distant past.
The guy or girl who wrote the software did not put that error trap in th
ecode for a laugh. It took effort, therefore pay attention to it.
Even if it seems stupid to you, or goes contrary to what you "know" is true
about the system, pay it some attention.

My own recent story is about Omnipath - I knew that the devices were
physically up, I could run the disagnostics etc.
But the particular MPI program failed to start - running ibv_devinfo
eventually led me to find that the ibverbs library was not installed.
I am not flagging this up as a particular example to be teased apart - just
as a general case.
supercomputer cluster running over high performance fabrics are complicated
beasts.  Itis not sufficient to plug in cards and cable.




















On 19 May 2017 at 11:12, John Hearns  wrote:

> I am not sure I agree with that.
> (a) the original error message from Gabriele was quite clear - the MPI
> could not find an interface card which was up, so it would not run.
> (b) Nysal actually pointed out the solution which looks good - after
> reaidng the documentation.. use  pami_noib
> (c) Having discussions like this helps us all to learn. I have made many
> stupid replies on this list, and looking at problems like this has helped
> me to learn.
>
>
>
>
> On 19 May 2017 at 11:01, r...@open-mpi.org  wrote:
>
>> If I might interject here before lots of time is wasted. Spectrum MPI is
>> an IBM -product- and is not free. What you are likely running into is that
>> their license manager is blocking you from running, albeit without a really
>> nice error message. I’m sure that’s something they are working on.
>>
>> If you really want to use Spectrum MPI, I suggest you contact them about
>> purchasing it.
>>
>>
>> On May 19, 2017, at 1:16 AM, Gabriele Fatigati 
>> wrote:
>>
>> Hi Gilles, in attach the outpuf of:
>>
>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>
>> 2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet :
>>
>>> Gabriele,
>>>
>>>
>>> can you
>>>
>>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>>
>>>
>>> so we can figure out why nor sm nor vader is used ?
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>>
>>> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:
>>>
 Oh no, by using two procs:


 findActiveDevices Error
 We found no active IB device ports
 findActiveDevices Error
 We found no active IB device ports
 
 --
 At least one pair of MPI processes are unable to reach each other for
 MPI communications.  This means that no Open MPI device has indicated
 that it can be used to communicate between these processes.  This is
 an error; Open MPI requires that all MPI processes be able to reach
 each other.  This error can sometimes be the result of forgetting to
 specify the "self" BTL.

   Process 1 ([[12380,1],0]) is on host: openpower
   Process 2 ([[12380,1],1]) is on host: openpower
   BTLs attempted: self

 Your MPI job is now going to abort; sorry.
 
 --
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***and potentially your MPI job)
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***and potentially your MPI job)
 
 --
 MPI_INIT has failed because at least one MPI process is unreachable
 from another.  This *usually* means that an underlying communication
 plugin -- such as a BTL or an MTL -- has either not loaded or not
 allowed itself to be used.  Your MPI job will now abort.

 You may wish to try to narrow down the problem;
  * Check the output of ompi_info to see which BTL/MTL plugins are
available.
  * Run your application with MPI_THREAD_SINGLE.
  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
 
 --
 [openpower:88867] 1 more process has sent help message
 help-mca-bml-r2.txt / unreachable proc
 [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to
 see all help / error messages
 [openpower:88867] 1 more process has sent help message
 help-mpi-runtime.txt / mpi_init:startup:pml-add-

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
I am not sure I agree with that.
(a) the original error message from Gabriele was quite clear - the MPI
could not find an interface card which was up, so it would not run.
(b) Nysal actually pointed out the solution which looks good - after
reaidng the documentation.. use  pami_noib
(c) Having discussions like this helps us all to learn. I have made many
stupid replies on this list, and looking at problems like this has helped
me to learn.




On 19 May 2017 at 11:01, r...@open-mpi.org  wrote:

> If I might interject here before lots of time is wasted. Spectrum MPI is
> an IBM -product- and is not free. What you are likely running into is that
> their license manager is blocking you from running, albeit without a really
> nice error message. I’m sure that’s something they are working on.
>
> If you really want to use Spectrum MPI, I suggest you contact them about
> purchasing it.
>
>
> On May 19, 2017, at 1:16 AM, Gabriele Fatigati 
> wrote:
>
> Hi Gilles, in attach the outpuf of:
>
> mpirun --mca btl_base_verbose 100 -np 2 ...
>
> 2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet :
>
>> Gabriele,
>>
>>
>> can you
>>
>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>
>>
>> so we can figure out why nor sm nor vader is used ?
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>>
>> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:
>>
>>> Oh no, by using two procs:
>>>
>>>
>>> findActiveDevices Error
>>> We found no active IB device ports
>>> findActiveDevices Error
>>> We found no active IB device ports
>>> 
>>> --
>>> At least one pair of MPI processes are unable to reach each other for
>>> MPI communications.  This means that no Open MPI device has indicated
>>> that it can be used to communicate between these processes.  This is
>>> an error; Open MPI requires that all MPI processes be able to reach
>>> each other.  This error can sometimes be the result of forgetting to
>>> specify the "self" BTL.
>>>
>>>   Process 1 ([[12380,1],0]) is on host: openpower
>>>   Process 2 ([[12380,1],1]) is on host: openpower
>>>   BTLs attempted: self
>>>
>>> Your MPI job is now going to abort; sorry.
>>> 
>>> --
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***and potentially your MPI job)
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***and potentially your MPI job)
>>> 
>>> --
>>> MPI_INIT has failed because at least one MPI process is unreachable
>>> from another.  This *usually* means that an underlying communication
>>> plugin -- such as a BTL or an MTL -- has either not loaded or not
>>> allowed itself to be used.  Your MPI job will now abort.
>>>
>>> You may wish to try to narrow down the problem;
>>>  * Check the output of ompi_info to see which BTL/MTL plugins are
>>>available.
>>>  * Run your application with MPI_THREAD_SINGLE.
>>>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>>>if using MTL-based communications) to see exactly which
>>>communication plugins were considered and/or discarded.
>>> 
>>> --
>>> [openpower:88867] 1 more process has sent help message
>>> help-mca-bml-r2.txt / unreachable proc
>>> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>> [openpower:88867] 1 more process has sent help message
>>> help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
>>>
>>>
>>>
>>>
>>>
>>> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati >> >:
>>>
>>> Hi GIlles,
>>>
>>> using your command with one MPI procs I get:
>>>
>>> findActiveDevices Error
>>> We found no active IB device ports
>>> Hello world from rank 0  out of 1 processors
>>>
>>> So it seems to work apart the error message.
>>>
>>>
>>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet >> >:
>>>
>>> Gabriele,
>>>
>>>
>>> so it seems pml/pami assumes there is an infiniband card
>>> available (!)
>>>
>>> i guess IBM folks will comment on that shortly.
>>>
>>>
>>> meanwhile, you do not need pami since you are running on a
>>> single node
>>>
>>> mpirun --mca pml ^pami ...
>>>
>>> should do the trick
>>>
>>> (if it does not work, can run and post the logs)
>>>
>>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>>
>>> Hi John,
>>> Infiniband is not used, there is a single nod

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Ok Gilles, the output of

mpirun --mca pml ^pami --mca btl_base_verbose 100

is in attached

2017-05-19 10:05 GMT+02:00 Gilles Gouaillardet :

> Gabriele,
>
>
> i am sorry, i really meant
>
> mpirun --mca pml ^pami --mca btl_base_verbose 100 ...
>
>
> Cheers,
>
> Gilles
>
> On 5/19/2017 4:28 PM, Gabriele Fatigati wrote:
>
>> Using:
>>
>> mpirun --mca pml ^pami --mca pml_base_verbose 100  -n 2  ./prova_mpi
>>
>> I attach the output
>>
>> 2017-05-19 9:16 GMT+02:00 John Hearns via users > >:
>>
>> Gabriele,
>> as Gilles says if you are running within a single host system, you
>> don not need the pami layer.
>> Usually you would use the btls  sm,selfthough I guess 'vader'
>> is the more up to date choice
>>
>> On 19 May 2017 at 09:10, Gilles Gouaillardet > > wrote:
>>
>> Gabriele,
>>
>>
>> so it seems pml/pami assumes there is an infiniband card
>> available (!)
>>
>> i guess IBM folks will comment on that shortly.
>>
>>
>> meanwhile, you do not need pami since you are running on a
>> single node
>>
>> mpirun --mca pml ^pami ...
>>
>> should do the trick
>>
>> (if it does not work, can run and post the logs)
>>
>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>
>> Hi John,
>> Infiniband is not used, there is a single node on this
>> machine.
>>
>> 2017-05-19 8:50 GMT+02:00 John Hearns via users
>> > 
>> > >>:
>>
>> Gabriele,   pleae run  'ibv_devinfo'
>> It looks to me like you may have the physical
>> interface cards in
>> these systems, but you do not have the correct drivers or
>> libraries loaded.
>>
>> I have had similar messages when using Infiniband on
>> x86 systems -
>> which did not have libibverbs installed.
>>
>>
>> On 19 May 2017 at 08:41, Gabriele Fatigati
>> mailto:g.fatig...@cineca.it>
>> > >> wrote:
>>
>> Hi Gilles, using your command:
>>
>> [openpower:88536] mca: base: components_register:
>> registering
>> framework pml components
>> [openpower:88536] mca: base: components_register:
>> found loaded
>> component pami
>> [openpower:88536] mca: base: components_register:
>> component
>> pami register function successful
>> [openpower:88536] mca: base: components_open:
>> opening pml
>> components
>> [openpower:88536] mca: base: components_open:
>> found loaded
>> component pami
>> [openpower:88536] mca: base: components_open:
>> component pami
>> open function successful
>> [openpower:88536] select: initializing pml
>> component pami
>> findActiveDevices Error
>> We found no active IB device ports
>> [openpower:88536] select: init returned failure
>> for component pami
>> [openpower:88536] PML pami cannot be selected
>>-
>> -
>> No components were able to be opened in the pml
>> framework.
>>
>> This typically means that either no components of
>> this type were
>> installed, or none of the installed componnets can
>> be loaded.
>> Sometimes this means that shared libraries
>> required by these
>> components are unable to be found/loaded.
>>
>>   Host:  openpower
>>   Framework: pml
>>-
>> -
>>
>>
>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>> mailto:gil...@rist.or.jp>
>> >>:
>>
>>
>> Gabriele,
>>
>>
>> pml/pami is here, at least according to ompi_info
>>
>>
>> can you update your mpirun command like this
>>
>> mpirun --mca pml_base_verbose 100

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread r...@open-mpi.org
If I might interject here before lots of time is wasted. Spectrum MPI is an IBM 
-product- and is not free. What you are likely running into is that their 
license manager is blocking you from running, albeit without a really nice 
error message. I’m sure that’s something they are working on.

If you really want to use Spectrum MPI, I suggest you contact them about 
purchasing it.


> On May 19, 2017, at 1:16 AM, Gabriele Fatigati  wrote:
> 
> Hi Gilles, in attach the outpuf of:
> 
> mpirun --mca btl_base_verbose 100 -np 2 ...
> 
> 2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet  >:
> Gabriele,
> 
> 
> can you
> 
> mpirun --mca btl_base_verbose 100 -np 2 ...
> 
> 
> so we can figure out why nor sm nor vader is used ?
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> 
> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:
> Oh no, by using two procs:
> 
> 
> findActiveDevices Error
> We found no active IB device ports
> findActiveDevices Error
> We found no active IB device ports
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[12380,1],0]) is on host: openpower
>   Process 2 ([[12380,1],1]) is on host: openpower
>   BTLs attempted: self
> 
> Your MPI job is now going to abort; sorry.
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> --
> MPI_INIT has failed because at least one MPI process is unreachable
> from another.  This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used.  Your MPI job will now abort.
> 
> You may wish to try to narrow down the problem;
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>if using MTL-based communications) to see exactly which
>communication plugins were considered and/or discarded.
> --
> [openpower:88867] 1 more process has sent help message help-mca-bml-r2.txt / 
> unreachable proc
> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
> all help / error messages
> [openpower:88867] 1 more process has sent help message help-mpi-runtime.txt / 
> mpi_init:startup:pml-add-procs-fail
> 
> 
> 
> 
> 
> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati    >>:
> 
> Hi GIlles,
> 
> using your command with one MPI procs I get:
> 
> findActiveDevices Error
> We found no active IB device ports
> Hello world from rank 0  out of 1 processors
> 
> So it seems to work apart the error message.
> 
> 
> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet  
> >>:
> 
> Gabriele,
> 
> 
> so it seems pml/pami assumes there is an infiniband card
> available (!)
> 
> i guess IBM folks will comment on that shortly.
> 
> 
> meanwhile, you do not need pami since you are running on a
> single node
> 
> mpirun --mca pml ^pami ...
> 
> should do the trick
> 
> (if it does not work, can run and post the logs)
> 
> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
> 
> Hi John,
> Infiniband is not used, there is a single node on this
> machine.
> 
> 2017-05-19 8:50 GMT+02:00 John Hearns via users
> mailto:users@lists.open-mpi.org>
>  >
> 
>   
> Gabriele,   pleae run  'ibv_devinfo'
> It looks to me like you may have the physical
> interface cards in
>

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Hi Gilles, in attach the outpuf of:

mpirun --mca btl_base_verbose 100 -np 2 ...

2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet :

> Gabriele,
>
>
> can you
>
> mpirun --mca btl_base_verbose 100 -np 2 ...
>
>
> so we can figure out why nor sm nor vader is used ?
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:
>
>> Oh no, by using two procs:
>>
>>
>> findActiveDevices Error
>> We found no active IB device ports
>> findActiveDevices Error
>> We found no active IB device ports
>> 
>> --
>> At least one pair of MPI processes are unable to reach each other for
>> MPI communications.  This means that no Open MPI device has indicated
>> that it can be used to communicate between these processes.  This is
>> an error; Open MPI requires that all MPI processes be able to reach
>> each other.  This error can sometimes be the result of forgetting to
>> specify the "self" BTL.
>>
>>   Process 1 ([[12380,1],0]) is on host: openpower
>>   Process 2 ([[12380,1],1]) is on host: openpower
>>   BTLs attempted: self
>>
>> Your MPI job is now going to abort; sorry.
>> 
>> --
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***and potentially your MPI job)
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***and potentially your MPI job)
>> 
>> --
>> MPI_INIT has failed because at least one MPI process is unreachable
>> from another.  This *usually* means that an underlying communication
>> plugin -- such as a BTL or an MTL -- has either not loaded or not
>> allowed itself to be used.  Your MPI job will now abort.
>>
>> You may wish to try to narrow down the problem;
>>  * Check the output of ompi_info to see which BTL/MTL plugins are
>>available.
>>  * Run your application with MPI_THREAD_SINGLE.
>>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>>if using MTL-based communications) to see exactly which
>>communication plugins were considered and/or discarded.
>> 
>> --
>> [openpower:88867] 1 more process has sent help message
>> help-mca-bml-r2.txt / unreachable proc
>> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help / error messages
>> [openpower:88867] 1 more process has sent help message
>> help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
>>
>>
>>
>>
>>
>> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati > >:
>>
>> Hi GIlles,
>>
>> using your command with one MPI procs I get:
>>
>> findActiveDevices Error
>> We found no active IB device ports
>> Hello world from rank 0  out of 1 processors
>>
>> So it seems to work apart the error message.
>>
>>
>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet > >:
>>
>> Gabriele,
>>
>>
>> so it seems pml/pami assumes there is an infiniband card
>> available (!)
>>
>> i guess IBM folks will comment on that shortly.
>>
>>
>> meanwhile, you do not need pami since you are running on a
>> single node
>>
>> mpirun --mca pml ^pami ...
>>
>> should do the trick
>>
>> (if it does not work, can run and post the logs)
>>
>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>
>> Hi John,
>> Infiniband is not used, there is a single node on this
>> machine.
>>
>> 2017-05-19 8:50 GMT+02:00 John Hearns via users
>> > 
>> > >>:
>>
>> Gabriele,   pleae run  'ibv_devinfo'
>> It looks to me like you may have the physical
>> interface cards in
>> these systems, but you do not have the correct drivers or
>> libraries loaded.
>>
>> I have had similar messages when using Infiniband on
>> x86 systems -
>> which did not have libibverbs installed.
>>
>>
>> On 19 May 2017 at 08:41, Gabriele Fatigati
>> mailto:g.fatig...@cineca.it>
>> > >> wrote:
>>
>> Hi Gilles, using your command:
>>
>> [openpower:88536] mca: base: components_register:
>> registering
>>  

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gilles Gouaillardet

Gabriele,


i am sorry, i really meant

mpirun --mca pml ^pami --mca btl_base_verbose 100 ...


Cheers,

Gilles

On 5/19/2017 4:28 PM, Gabriele Fatigati wrote:

Using:

mpirun --mca pml ^pami --mca pml_base_verbose 100  -n 2  ./prova_mpi

I attach the output

2017-05-19 9:16 GMT+02:00 John Hearns via users 
mailto:users@lists.open-mpi.org>>:


Gabriele,
as Gilles says if you are running within a single host system, you
don not need the pami layer.
Usually you would use the btls  sm,selfthough I guess 'vader'
is the more up to date choice

On 19 May 2017 at 09:10, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote:

Gabriele,


so it seems pml/pami assumes there is an infiniband card
available (!)

i guess IBM folks will comment on that shortly.


meanwhile, you do not need pami since you are running on a
single node

mpirun --mca pml ^pami ...

should do the trick

(if it does not work, can run and post the logs)

mpirun --mca pml ^pami --mca pml_base_verbose 100 ...


Cheers,


Gilles


On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:

Hi John,
Infiniband is not used, there is a single node on this
machine.

2017-05-19 8:50 GMT+02:00 John Hearns via users
mailto:users@lists.open-mpi.org>
>>:

Gabriele,   pleae run  'ibv_devinfo'
It looks to me like you may have the physical
interface cards in
these systems, but you do not have the correct drivers or
libraries loaded.

I have had similar messages when using Infiniband on
x86 systems -
which did not have libibverbs installed.


On 19 May 2017 at 08:41, Gabriele Fatigati
mailto:g.fatig...@cineca.it>
>> wrote:

Hi Gilles, using your command:

[openpower:88536] mca: base: components_register:
registering
framework pml components
[openpower:88536] mca: base: components_register:
found loaded
component pami
[openpower:88536] mca: base: components_register:
component
pami register function successful
[openpower:88536] mca: base: components_open:
opening pml
components
[openpower:88536] mca: base: components_open:
found loaded
component pami
[openpower:88536] mca: base: components_open:
component pami
open function successful
[openpower:88536] select: initializing pml
component pami
findActiveDevices Error
We found no active IB device ports
[openpower:88536] select: init returned failure
for component pami
[openpower:88536] PML pami cannot be selected
   
--

No components were able to be opened in the pml
framework.

This typically means that either no components of
this type were
installed, or none of the installed componnets can
be loaded.
Sometimes this means that shared libraries
required by these
components are unable to be found/loaded.

  Host:  openpower
  Framework: pml
   
--



2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
mailto:gil...@rist.or.jp>
>>:

Gabriele,


pml/pami is here, at least according to ompi_info


can you update your mpirun command like this

mpirun --mca pml_base_verbose 100 ..


and post the output ?


Cheers,

Gilles

On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:

Hi Gilles, attached the requested info

2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
mailto:gilles.gouaillar...@gmail.com>
>
   

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Nysal Jan K A
hi Gabriele,
You can check some of the available options here -
https://www.ibm.com/support/knowledgecenter/en/SSZTET_10.1.0/smpi02/smpi02_interconnect.html
The "-pami_noib" option might be of help in this scenario. Alternatively,
on a single node, the vader BTL can also be used.

Regards
--Nysal

On Fri, May 19, 2017 at 12:52 PM, Gabriele Fatigati 
wrote:

> Hi GIlles,
>
> using your command with one MPI procs I get:
>
> findActiveDevices Error
> We found no active IB device ports
> Hello world from rank 0  out of 1 processors
>
> So it seems to work apart the error message.
>
>
> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet :
>
>> Gabriele,
>>
>>
>> so it seems pml/pami assumes there is an infiniband card available (!)
>>
>> i guess IBM folks will comment on that shortly.
>>
>>
>> meanwhile, you do not need pami since you are running on a single node
>>
>> mpirun --mca pml ^pami ...
>>
>> should do the trick
>>
>> (if it does not work, can run and post the logs)
>>
>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>
>>> Hi John,
>>> Infiniband is not used, there is a single node on this machine.
>>>
>>> 2017-05-19 8:50 GMT+02:00 John Hearns via users <
>>> users@lists.open-mpi.org >:
>>>
>>> Gabriele,   pleae run  'ibv_devinfo'
>>> It looks to me like you may have the physical interface cards in
>>> these systems, but you do not have the correct drivers or
>>> libraries loaded.
>>>
>>> I have had similar messages when using Infiniband on x86 systems -
>>> which did not have libibverbs installed.
>>>
>>>
>>> On 19 May 2017 at 08:41, Gabriele Fatigati >> > wrote:
>>>
>>> Hi Gilles, using your command:
>>>
>>> [openpower:88536] mca: base: components_register: registering
>>> framework pml components
>>> [openpower:88536] mca: base: components_register: found loaded
>>> component pami
>>> [openpower:88536] mca: base: components_register: component
>>> pami register function successful
>>> [openpower:88536] mca: base: components_open: opening pml
>>> components
>>> [openpower:88536] mca: base: components_open: found loaded
>>> component pami
>>> [openpower:88536] mca: base: components_open: component pami
>>> open function successful
>>> [openpower:88536] select: initializing pml component pami
>>> findActiveDevices Error
>>> We found no active IB device ports
>>> [openpower:88536] select: init returned failure for component
>>> pami
>>> [openpower:88536] PML pami cannot be selected
>>> 
>>> --
>>> No components were able to be opened in the pml framework.
>>>
>>> This typically means that either no components of this type were
>>> installed, or none of the installed componnets can be loaded.
>>> Sometimes this means that shared libraries required by these
>>> components are unable to be found/loaded.
>>>
>>>   Host:  openpower
>>>   Framework: pml
>>> 
>>> --
>>>
>>>
>>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>>> mailto:gil...@rist.or.jp>>:
>>>
>>> Gabriele,
>>>
>>>
>>> pml/pami is here, at least according to ompi_info
>>>
>>>
>>> can you update your mpirun command like this
>>>
>>> mpirun --mca pml_base_verbose 100 ..
>>>
>>>
>>> and post the output ?
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
>>>
>>> Hi Gilles, attached the requested info
>>>
>>> 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
>>> >> 
>>> >> >>:
>>>
>>> Gabriele,
>>>
>>> can you
>>> ompi_info --all | grep pml
>>>
>>> also, make sure there is nothing in your
>>> environment pointing to
>>> an other Open MPI install
>>> for example
>>> ldd a.out
>>> should only point to IBM libraries
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Thursday, May 18, 2017, Gabriele Fatigati
>>> mailto:g.fatig...@cineca.it>
>>> >>
>>> >> wrote:
>>>
>>> Dear OpenMPI users and developer

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Nathan Hjelm
Add —mca btl self,vader

-Nathan

> On May 19, 2017, at 1:23 AM, Gabriele Fatigati  wrote:
> 
> Oh no, by using two procs:
> 
> 
> findActiveDevices Error
> We found no active IB device ports
> findActiveDevices Error
> We found no active IB device ports
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[12380,1],0]) is on host: openpower
>   Process 2 ([[12380,1],1]) is on host: openpower
>   BTLs attempted: self
> 
> Your MPI job is now going to abort; sorry.
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> --
> MPI_INIT has failed because at least one MPI process is unreachable
> from another.  This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used.  Your MPI job will now abort.
> 
> You may wish to try to narrow down the problem;
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>if using MTL-based communications) to see exactly which
>communication plugins were considered and/or discarded.
> --
> [openpower:88867] 1 more process has sent help message help-mca-bml-r2.txt / 
> unreachable proc
> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
> all help / error messages
> [openpower:88867] 1 more process has sent help message help-mpi-runtime.txt / 
> mpi_init:startup:pml-add-procs-fail
> 
> 
> 
> 
> 
> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati :
> Hi GIlles,
> 
> using your command with one MPI procs I get:
> 
> findActiveDevices Error
> We found no active IB device ports
> Hello world from rank 0  out of 1 processors
> 
> So it seems to work apart the error message.
> 
> 
> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet :
> Gabriele,
> 
> 
> so it seems pml/pami assumes there is an infiniband card available (!)
> 
> i guess IBM folks will comment on that shortly.
> 
> 
> meanwhile, you do not need pami since you are running on a single node
> 
> mpirun --mca pml ^pami ...
> 
> should do the trick
> 
> (if it does not work, can run and post the logs)
> 
> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
> Hi John,
> Infiniband is not used, there is a single node on this machine.
> 
> 2017-05-19 8:50 GMT+02:00 John Hearns via users  >:
> 
> Gabriele,   pleae run  'ibv_devinfo'
> It looks to me like you may have the physical interface cards in
> these systems, but you do not have the correct drivers or
> libraries loaded.
> 
> I have had similar messages when using Infiniband on x86 systems -
> which did not have libibverbs installed.
> 
> 
> On 19 May 2017 at 08:41, Gabriele Fatigati  > wrote:
> 
> Hi Gilles, using your command:
> 
> [openpower:88536] mca: base: components_register: registering
> framework pml components
> [openpower:88536] mca: base: components_register: found loaded
> component pami
> [openpower:88536] mca: base: components_register: component
> pami register function successful
> [openpower:88536] mca: base: components_open: opening pml
> components
> [openpower:88536] mca: base: components_open: found loaded
> component pami
> [openpower:88536] mca: base: components_open: component pami
> open function successful
> [openpower:88536] select: initializing pml component pami
> findActiveDevices Error
> We found no active IB device ports
> [openpower:88536] select: init returned failure for component pami
> [openpower:88536] PML pami cannot be selected
> 
> --
> No components were able to be opened in the pml framework.
> 
> This typically means

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gilles Gouaillardet

Gabriele,


can you

mpirun --mca btl_base_verbose 100 -np 2 ...


so we can figure out why nor sm nor vader is used ?


Cheers,


Gilles


On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:

Oh no, by using two procs:


findActiveDevices Error
We found no active IB device ports
findActiveDevices Error
We found no active IB device ports
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[12380,1],0]) is on host: openpower
  Process 2 ([[12380,1],1]) is on host: openpower
  BTLs attempted: self

Your MPI job is now going to abort; sorry.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
--
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;
 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--
[openpower:88867] 1 more process has sent help message 
help-mca-bml-r2.txt / unreachable proc
[openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to 
see all help / error messages
[openpower:88867] 1 more process has sent help message 
help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail






2017-05-19 9:22 GMT+02:00 Gabriele Fatigati >:


Hi GIlles,

using your command with one MPI procs I get:

findActiveDevices Error
We found no active IB device ports
Hello world from rank 0  out of 1 processors

So it seems to work apart the error message.


2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet mailto:gil...@rist.or.jp>>:

Gabriele,


so it seems pml/pami assumes there is an infiniband card
available (!)

i guess IBM folks will comment on that shortly.


meanwhile, you do not need pami since you are running on a
single node

mpirun --mca pml ^pami ...

should do the trick

(if it does not work, can run and post the logs)

mpirun --mca pml ^pami --mca pml_base_verbose 100 ...


Cheers,


Gilles


On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:

Hi John,
Infiniband is not used, there is a single node on this
machine.

2017-05-19 8:50 GMT+02:00 John Hearns via users
mailto:users@lists.open-mpi.org>
>>:

Gabriele,   pleae run  'ibv_devinfo'
It looks to me like you may have the physical
interface cards in
these systems, but you do not have the correct drivers or
libraries loaded.

I have had similar messages when using Infiniband on
x86 systems -
which did not have libibverbs installed.


On 19 May 2017 at 08:41, Gabriele Fatigati
mailto:g.fatig...@cineca.it>
>> wrote:

Hi Gilles, using your command:

[openpower:88536] mca: base: components_register:
registering
framework pml components
[openpower:88536] mca: base: components_register:
found loaded
component pami
[openpower:88536] mca: base: components_register:
component
pami register function successful
[openpower:88536] mca: base: components_open:
opening pml
components
[openpower:88536] mca: base: components_open:
found loaded
c

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
BTLs attempted: self

That should only allow a single process to communicate with its self




On 19 May 2017 at 09:23, Gabriele Fatigati  wrote:

> Oh no, by using two procs:
>
>
> findActiveDevices Error
> We found no active IB device ports
> findActiveDevices Error
> We found no active IB device ports
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>
>   Process 1 ([[12380,1],0]) is on host: openpower
>   Process 2 ([[12380,1],1]) is on host: openpower
>   BTLs attempted: self
>
> Your MPI job is now going to abort; sorry.
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> --
> MPI_INIT has failed because at least one MPI process is unreachable
> from another.  This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used.  Your MPI job will now abort.
>
> You may wish to try to narrow down the problem;
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>if using MTL-based communications) to see exactly which
>communication plugins were considered and/or discarded.
> --
> [openpower:88867] 1 more process has sent help message help-mca-bml-r2.txt
> / unreachable proc
> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
> [openpower:88867] 1 more process has sent help message
> help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
>
>
>
>
>
> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati :
>
>> Hi GIlles,
>>
>> using your command with one MPI procs I get:
>>
>> findActiveDevices Error
>> We found no active IB device ports
>> Hello world from rank 0  out of 1 processors
>>
>> So it seems to work apart the error message.
>>
>>
>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet :
>>
>>> Gabriele,
>>>
>>>
>>> so it seems pml/pami assumes there is an infiniband card available (!)
>>>
>>> i guess IBM folks will comment on that shortly.
>>>
>>>
>>> meanwhile, you do not need pami since you are running on a single node
>>>
>>> mpirun --mca pml ^pami ...
>>>
>>> should do the trick
>>>
>>> (if it does not work, can run and post the logs)
>>>
>>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>>
 Hi John,
 Infiniband is not used, there is a single node on this machine.

 2017-05-19 8:50 GMT+02:00 John Hearns via users <
 users@lists.open-mpi.org >:

 Gabriele,   pleae run  'ibv_devinfo'
 It looks to me like you may have the physical interface cards in
 these systems, but you do not have the correct drivers or
 libraries loaded.

 I have had similar messages when using Infiniband on x86 systems -
 which did not have libibverbs installed.


 On 19 May 2017 at 08:41, Gabriele Fatigati >>> > wrote:

 Hi Gilles, using your command:

 [openpower:88536] mca: base: components_register: registering
 framework pml components
 [openpower:88536] mca: base: components_register: found loaded
 component pami
 [openpower:88536] mca: base: components_register: component
 pami register function successful
 [openpower:88536] mca: base: components_open: opening pml
 components
 [openpower:88536] mca: base: components_open: found loaded
 component pami
 [openpower:88536] mca: base: components_open: component pami
 open function successful
 [openpower:88536] select: initializing pml component pami
 findActiveDevices Error
 We found no active IB device ports
 [openpower:88536] select: init returned failure for component
 pami
 [openp

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Using:

mpirun --mca pml ^pami --mca pml_base_verbose 100  -n 2  ./prova_mpi

I attach the output

2017-05-19 9:16 GMT+02:00 John Hearns via users :

> Gabriele,
> as Gilles says if you are running within a single host system, you don not
> need the pami layer.
> Usually you would use the btls  sm,selfthough I guess 'vader' is the
> more up to date choice
>
> On 19 May 2017 at 09:10, Gilles Gouaillardet  wrote:
>
>> Gabriele,
>>
>>
>> so it seems pml/pami assumes there is an infiniband card available (!)
>>
>> i guess IBM folks will comment on that shortly.
>>
>>
>> meanwhile, you do not need pami since you are running on a single node
>>
>> mpirun --mca pml ^pami ...
>>
>> should do the trick
>>
>> (if it does not work, can run and post the logs)
>>
>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>
>>> Hi John,
>>> Infiniband is not used, there is a single node on this machine.
>>>
>>> 2017-05-19 8:50 GMT+02:00 John Hearns via users <
>>> users@lists.open-mpi.org >:
>>>
>>> Gabriele,   pleae run  'ibv_devinfo'
>>> It looks to me like you may have the physical interface cards in
>>> these systems, but you do not have the correct drivers or
>>> libraries loaded.
>>>
>>> I have had similar messages when using Infiniband on x86 systems -
>>> which did not have libibverbs installed.
>>>
>>>
>>> On 19 May 2017 at 08:41, Gabriele Fatigati >> > wrote:
>>>
>>> Hi Gilles, using your command:
>>>
>>> [openpower:88536] mca: base: components_register: registering
>>> framework pml components
>>> [openpower:88536] mca: base: components_register: found loaded
>>> component pami
>>> [openpower:88536] mca: base: components_register: component
>>> pami register function successful
>>> [openpower:88536] mca: base: components_open: opening pml
>>> components
>>> [openpower:88536] mca: base: components_open: found loaded
>>> component pami
>>> [openpower:88536] mca: base: components_open: component pami
>>> open function successful
>>> [openpower:88536] select: initializing pml component pami
>>> findActiveDevices Error
>>> We found no active IB device ports
>>> [openpower:88536] select: init returned failure for component
>>> pami
>>> [openpower:88536] PML pami cannot be selected
>>> 
>>> --
>>> No components were able to be opened in the pml framework.
>>>
>>> This typically means that either no components of this type were
>>> installed, or none of the installed componnets can be loaded.
>>> Sometimes this means that shared libraries required by these
>>> components are unable to be found/loaded.
>>>
>>>   Host:  openpower
>>>   Framework: pml
>>> 
>>> --
>>>
>>>
>>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>>> mailto:gil...@rist.or.jp>>:
>>>
>>> Gabriele,
>>>
>>>
>>> pml/pami is here, at least according to ompi_info
>>>
>>>
>>> can you update your mpirun command like this
>>>
>>> mpirun --mca pml_base_verbose 100 ..
>>>
>>>
>>> and post the output ?
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
>>>
>>> Hi Gilles, attached the requested info
>>>
>>> 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
>>> >> 
>>> >> >>:
>>>
>>> Gabriele,
>>>
>>> can you
>>> ompi_info --all | grep pml
>>>
>>> also, make sure there is nothing in your
>>> environment pointing to
>>> an other Open MPI install
>>> for example
>>> ldd a.out
>>> should only point to IBM libraries
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Thursday, May 18, 2017, Gabriele Fatigati
>>> mailto:g.fatig...@cineca.it>
>>> >>
>>> >> wrote:
>>>
>>> Dear OpenMPI users and developers, I'm using
>>> IBM Spectrum MPI
>>> 10.1.0 based on OpenMPI, so I hope there are
>>> some MPI expert
>>> can help me to solve the proble

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Allan,
remember that Infiniband is not Ethernet.  You dont NEED to set up IPOIB
interfaces.

Two diagnostics please for you to run:

ibnetdiscover

ibdiagnet


Let us please have the reuslts ofibnetdiscover




On 19 May 2017 at 09:25, John Hearns  wrote:

> Giles, Allan,
>
> if the host 'smd' is acting as a cluster head node it is not a must for it
> to have an Infiniband card.
> So you should be able to run jobs across the other nodes, which have
> Qlogic cards.
> I may have something mixed up here, if so I am sorry.
>
> If you want also to run jobs on the smd host, you should take note of what
> Giles says.
> You may be out of luck in that case.
>
> On 19 May 2017 at 09:15, Gilles Gouaillardet  wrote:
>
>> Allan,
>>
>>
>> i just noted smd has a Mellanox card, while other nodes have QLogic cards.
>>
>> mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best for
>> Mellanox,
>>
>> but these are not interoperable. also, i do not think btl/openib can be
>> used with QLogic cards
>>
>> (please someone correct me if i am wrong)
>>
>>
>> from the logs, i can see that smd (Mellanox) is not even able to use the
>> infiniband port.
>>
>> if you run with 2 MPI tasks, both run on smd and hence btl/vader is used,
>> that is why it works
>>
>> if you run with more than 2 MPI tasks, then smd and other nodes are used,
>> and every MPI task fall back to btl/tcp
>>
>> for inter node communication.
>>
>> [smd][[41971,1],1][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 192.168.1.196 failed: No route to host (113)
>>
>> this usually indicates a firewall, but since both ssh and oob/tcp are
>> fine, this puzzles me.
>>
>>
>> what if you
>>
>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>> --mca btl_tcp_if_include 192.168.1.0/24 --mca pml ob1 --mca btl
>> tcp,sm,vader,self  ring
>>
>> that should work with no error messages, and then you can try with 12 MPI
>> tasks
>>
>> (note internode MPI communications will use tcp only)
>>
>>
>> if you want optimal performance, i am afraid you cannot run any MPI task
>> on smd (so mtl/psm can be used )
>>
>> (btw, make sure PSM support was built in Open MPI)
>>
>> a suboptimal option is to force MPI communications on IPoIB with
>>
>> /* make sure all nodes can ping each other via IPoIB first */
>>
>> mpirun --mca oob_tcp_if_include 192.168.1.0/24 --mca btl_tcp_if_include
>> 10.1.0.0/24 --mca pml ob1 --mca btl tcp,sm,vader,self
>>
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 5/19/2017 3:50 PM, Allan Overstreet wrote:
>>
>>> Gilles,
>>>
>>> On which node is mpirun invoked ?
>>>
>>> The mpirun command was involed on node smd.
>>>
>>> Are you running from a batch manager?
>>>
>>> No.
>>>
>>> Is there any firewall running on your nodes ?
>>>
>>> No CentOS minimal does not have a firewall installed and Ubuntu
>>> Mate's firewall is disabled.
>>>
>>> All three of your commands have appeared to run successfully. The
>>> outputs of the three commands are attached.
>>>
>>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 true &> cmd1
>>>
>>> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 true &> cmd2
>>>
>>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 ring &> cmd3
>>>
>>> If I increase the number of processors in the ring program, mpirun will
>>> not succeed.
>>>
>>> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 ring &> cmd4
>>>
>>>
>>> On 05/19/2017 02:18 AM, Gilles Gouaillardet wrote:
>>>
 Allan,


 - on which node is mpirun invoked ?

 - are you running from a batch manager ?

 - is there any firewall running on your nodes ?


 the error is likely occuring when wiring-up mpirun/orted

 what if you

 mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
 --mca oob_base_verbose 100 true

 then (if the previous command worked)

 mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
 --mca oob_base_verbose 100 true

 and finally (if both previous commands worked)

 mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
 --mca oob_base_verbose 100 ring


 Cheers,

 Gilles

 On 5/19/2017 3:07 PM, Allan Overstreet wrote:

> I experiencing many different errors with openmpi version 2.1.1. I
> have had a suspicion that this might be related to the way the servers 
> were
> connected and configured. Regardless below is a diagram of how the server
> are configured.
>
> __  _
>[__]|=|
>/::/|_|
>HOST: smd
>Dua

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Giles, Allan,

if the host 'smd' is acting as a cluster head node it is not a must for it
to have an Infiniband card.
So you should be able to run jobs across the other nodes, which have Qlogic
cards.
I may have something mixed up here, if so I am sorry.

If you want also to run jobs on the smd host, you should take note of what
Giles says.
You may be out of luck in that case.

On 19 May 2017 at 09:15, Gilles Gouaillardet  wrote:

> Allan,
>
>
> i just noted smd has a Mellanox card, while other nodes have QLogic cards.
>
> mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best for
> Mellanox,
>
> but these are not interoperable. also, i do not think btl/openib can be
> used with QLogic cards
>
> (please someone correct me if i am wrong)
>
>
> from the logs, i can see that smd (Mellanox) is not even able to use the
> infiniband port.
>
> if you run with 2 MPI tasks, both run on smd and hence btl/vader is used,
> that is why it works
>
> if you run with more than 2 MPI tasks, then smd and other nodes are used,
> and every MPI task fall back to btl/tcp
>
> for inter node communication.
>
> [smd][[41971,1],1][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.1.196 failed: No route to host (113)
>
> this usually indicates a firewall, but since both ssh and oob/tcp are
> fine, this puzzles me.
>
>
> what if you
>
> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
> --mca btl_tcp_if_include 192.168.1.0/24 --mca pml ob1 --mca btl
> tcp,sm,vader,self  ring
>
> that should work with no error messages, and then you can try with 12 MPI
> tasks
>
> (note internode MPI communications will use tcp only)
>
>
> if you want optimal performance, i am afraid you cannot run any MPI task
> on smd (so mtl/psm can be used )
>
> (btw, make sure PSM support was built in Open MPI)
>
> a suboptimal option is to force MPI communications on IPoIB with
>
> /* make sure all nodes can ping each other via IPoIB first */
>
> mpirun --mca oob_tcp_if_include 192.168.1.0/24 --mca btl_tcp_if_include
> 10.1.0.0/24 --mca pml ob1 --mca btl tcp,sm,vader,self
>
>
>
> Cheers,
>
>
> Gilles
>
>
> On 5/19/2017 3:50 PM, Allan Overstreet wrote:
>
>> Gilles,
>>
>> On which node is mpirun invoked ?
>>
>> The mpirun command was involed on node smd.
>>
>> Are you running from a batch manager?
>>
>> No.
>>
>> Is there any firewall running on your nodes ?
>>
>> No CentOS minimal does not have a firewall installed and Ubuntu
>> Mate's firewall is disabled.
>>
>> All three of your commands have appeared to run successfully. The outputs
>> of the three commands are attached.
>>
>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>> --mca oob_base_verbose 100 true &> cmd1
>>
>> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>> --mca oob_base_verbose 100 true &> cmd2
>>
>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>> --mca oob_base_verbose 100 ring &> cmd3
>>
>> If I increase the number of processors in the ring program, mpirun will
>> not succeed.
>>
>> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>> --mca oob_base_verbose 100 ring &> cmd4
>>
>>
>> On 05/19/2017 02:18 AM, Gilles Gouaillardet wrote:
>>
>>> Allan,
>>>
>>>
>>> - on which node is mpirun invoked ?
>>>
>>> - are you running from a batch manager ?
>>>
>>> - is there any firewall running on your nodes ?
>>>
>>>
>>> the error is likely occuring when wiring-up mpirun/orted
>>>
>>> what if you
>>>
>>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 true
>>>
>>> then (if the previous command worked)
>>>
>>> mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 true
>>>
>>> and finally (if both previous commands worked)
>>>
>>> mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24
>>> --mca oob_base_verbose 100 ring
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 5/19/2017 3:07 PM, Allan Overstreet wrote:
>>>
 I experiencing many different errors with openmpi version 2.1.1. I have
 had a suspicion that this might be related to the way the servers were
 connected and configured. Regardless below is a diagram of how the server
 are configured.

 __  _
[__]|=|
/::/|_|
HOST: smd
Dual 1Gb Ethernet Bonded
.-> Bond0 IP: 192.168.1.200
|   Infiniband Card: MHQH29B-XTR <.
|   Ib0 IP: 10.1.0.1  |
|   OS: Ubuntu Mate   |
|   __ _ |
| [__]|=||
  

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Oh no, by using two procs:


findActiveDevices Error
We found no active IB device ports
findActiveDevices Error
We found no active IB device ports
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[12380,1],0]) is on host: openpower
  Process 2 ([[12380,1],1]) is on host: openpower
  BTLs attempted: self

Your MPI job is now going to abort; sorry.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
--
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;
 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--
[openpower:88867] 1 more process has sent help message help-mca-bml-r2.txt
/ unreachable proc
[openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
[openpower:88867] 1 more process has sent help message help-mpi-runtime.txt
/ mpi_init:startup:pml-add-procs-fail





2017-05-19 9:22 GMT+02:00 Gabriele Fatigati :

> Hi GIlles,
>
> using your command with one MPI procs I get:
>
> findActiveDevices Error
> We found no active IB device ports
> Hello world from rank 0  out of 1 processors
>
> So it seems to work apart the error message.
>
>
> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet :
>
>> Gabriele,
>>
>>
>> so it seems pml/pami assumes there is an infiniband card available (!)
>>
>> i guess IBM folks will comment on that shortly.
>>
>>
>> meanwhile, you do not need pami since you are running on a single node
>>
>> mpirun --mca pml ^pami ...
>>
>> should do the trick
>>
>> (if it does not work, can run and post the logs)
>>
>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>
>>> Hi John,
>>> Infiniband is not used, there is a single node on this machine.
>>>
>>> 2017-05-19 8:50 GMT+02:00 John Hearns via users <
>>> users@lists.open-mpi.org >:
>>>
>>> Gabriele,   pleae run  'ibv_devinfo'
>>> It looks to me like you may have the physical interface cards in
>>> these systems, but you do not have the correct drivers or
>>> libraries loaded.
>>>
>>> I have had similar messages when using Infiniband on x86 systems -
>>> which did not have libibverbs installed.
>>>
>>>
>>> On 19 May 2017 at 08:41, Gabriele Fatigati >> > wrote:
>>>
>>> Hi Gilles, using your command:
>>>
>>> [openpower:88536] mca: base: components_register: registering
>>> framework pml components
>>> [openpower:88536] mca: base: components_register: found loaded
>>> component pami
>>> [openpower:88536] mca: base: components_register: component
>>> pami register function successful
>>> [openpower:88536] mca: base: components_open: opening pml
>>> components
>>> [openpower:88536] mca: base: components_open: found loaded
>>> component pami
>>> [openpower:88536] mca: base: components_open: component pami
>>> open function successful
>>> [openpower:88536] select: initializing pml component pami
>>> findActiveDevices Error
>>> We found no active IB device ports
>>> [openpower:88536] select: init returned failure for component
>>> pami
>>> [openpower:88536] PML pami cannot be selected
>>> 
>>> --
>>> No components were able to be opened in the pml framework.
>>>
>>> This typically means that either no components of this type were
>>> installed, or none of the inst

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Hi GIlles,

using your command with one MPI procs I get:

findActiveDevices Error
We found no active IB device ports
Hello world from rank 0  out of 1 processors

So it seems to work apart the error message.


2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet :

> Gabriele,
>
>
> so it seems pml/pami assumes there is an infiniband card available (!)
>
> i guess IBM folks will comment on that shortly.
>
>
> meanwhile, you do not need pami since you are running on a single node
>
> mpirun --mca pml ^pami ...
>
> should do the trick
>
> (if it does not work, can run and post the logs)
>
> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>
>
> Cheers,
>
>
> Gilles
>
>
> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>
>> Hi John,
>> Infiniband is not used, there is a single node on this machine.
>>
>> 2017-05-19 8:50 GMT+02:00 John Hearns via users > >:
>>
>> Gabriele,   pleae run  'ibv_devinfo'
>> It looks to me like you may have the physical interface cards in
>> these systems, but you do not have the correct drivers or
>> libraries loaded.
>>
>> I have had similar messages when using Infiniband on x86 systems -
>> which did not have libibverbs installed.
>>
>>
>> On 19 May 2017 at 08:41, Gabriele Fatigati > > wrote:
>>
>> Hi Gilles, using your command:
>>
>> [openpower:88536] mca: base: components_register: registering
>> framework pml components
>> [openpower:88536] mca: base: components_register: found loaded
>> component pami
>> [openpower:88536] mca: base: components_register: component
>> pami register function successful
>> [openpower:88536] mca: base: components_open: opening pml
>> components
>> [openpower:88536] mca: base: components_open: found loaded
>> component pami
>> [openpower:88536] mca: base: components_open: component pami
>> open function successful
>> [openpower:88536] select: initializing pml component pami
>> findActiveDevices Error
>> We found no active IB device ports
>> [openpower:88536] select: init returned failure for component pami
>> [openpower:88536] PML pami cannot be selected
>> 
>> --
>> No components were able to be opened in the pml framework.
>>
>> This typically means that either no components of this type were
>> installed, or none of the installed componnets can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>>
>>   Host:  openpower
>>   Framework: pml
>> 
>> --
>>
>>
>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>> mailto:gil...@rist.or.jp>>:
>>
>> Gabriele,
>>
>>
>> pml/pami is here, at least according to ompi_info
>>
>>
>> can you update your mpirun command like this
>>
>> mpirun --mca pml_base_verbose 100 ..
>>
>>
>> and post the output ?
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
>>
>> Hi Gilles, attached the requested info
>>
>> 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
>> > 
>> > >>:
>>
>> Gabriele,
>>
>> can you
>> ompi_info --all | grep pml
>>
>> also, make sure there is nothing in your
>> environment pointing to
>> an other Open MPI install
>> for example
>> ldd a.out
>> should only point to IBM libraries
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Thursday, May 18, 2017, Gabriele Fatigati
>> mailto:g.fatig...@cineca.it>
>> >
>> >> wrote:
>>
>> Dear OpenMPI users and developers, I'm using
>> IBM Spectrum MPI
>> 10.1.0 based on OpenMPI, so I hope there are
>> some MPI expert
>> can help me to solve the problem.
>>
>> When I run a simple Hello World MPI program, I
>> get the follow
>> error message:
>>
>>
>> A requested component was not found, or was
>> unable to be
>> opened.  This
>>

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
Gabriele,
as Gilles says if you are running within a single host system, you don not
need the pami layer.
Usually you would use the btls  sm,selfthough I guess 'vader' is the
more up to date choice

On 19 May 2017 at 09:10, Gilles Gouaillardet  wrote:

> Gabriele,
>
>
> so it seems pml/pami assumes there is an infiniband card available (!)
>
> i guess IBM folks will comment on that shortly.
>
>
> meanwhile, you do not need pami since you are running on a single node
>
> mpirun --mca pml ^pami ...
>
> should do the trick
>
> (if it does not work, can run and post the logs)
>
> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>
>
> Cheers,
>
>
> Gilles
>
>
> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>
>> Hi John,
>> Infiniband is not used, there is a single node on this machine.
>>
>> 2017-05-19 8:50 GMT+02:00 John Hearns via users > >:
>>
>> Gabriele,   pleae run  'ibv_devinfo'
>> It looks to me like you may have the physical interface cards in
>> these systems, but you do not have the correct drivers or
>> libraries loaded.
>>
>> I have had similar messages when using Infiniband on x86 systems -
>> which did not have libibverbs installed.
>>
>>
>> On 19 May 2017 at 08:41, Gabriele Fatigati > > wrote:
>>
>> Hi Gilles, using your command:
>>
>> [openpower:88536] mca: base: components_register: registering
>> framework pml components
>> [openpower:88536] mca: base: components_register: found loaded
>> component pami
>> [openpower:88536] mca: base: components_register: component
>> pami register function successful
>> [openpower:88536] mca: base: components_open: opening pml
>> components
>> [openpower:88536] mca: base: components_open: found loaded
>> component pami
>> [openpower:88536] mca: base: components_open: component pami
>> open function successful
>> [openpower:88536] select: initializing pml component pami
>> findActiveDevices Error
>> We found no active IB device ports
>> [openpower:88536] select: init returned failure for component pami
>> [openpower:88536] PML pami cannot be selected
>> 
>> --
>> No components were able to be opened in the pml framework.
>>
>> This typically means that either no components of this type were
>> installed, or none of the installed componnets can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>>
>>   Host:  openpower
>>   Framework: pml
>> 
>> --
>>
>>
>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>> mailto:gil...@rist.or.jp>>:
>>
>> Gabriele,
>>
>>
>> pml/pami is here, at least according to ompi_info
>>
>>
>> can you update your mpirun command like this
>>
>> mpirun --mca pml_base_verbose 100 ..
>>
>>
>> and post the output ?
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
>>
>> Hi Gilles, attached the requested info
>>
>> 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
>> > 
>> > >>:
>>
>> Gabriele,
>>
>> can you
>> ompi_info --all | grep pml
>>
>> also, make sure there is nothing in your
>> environment pointing to
>> an other Open MPI install
>> for example
>> ldd a.out
>> should only point to IBM libraries
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Thursday, May 18, 2017, Gabriele Fatigati
>> mailto:g.fatig...@cineca.it>
>> >
>> >> wrote:
>>
>> Dear OpenMPI users and developers, I'm using
>> IBM Spectrum MPI
>> 10.1.0 based on OpenMPI, so I hope there are
>> some MPI expert
>> can help me to solve the problem.
>>
>> When I run a simple Hello World MPI program, I
>> get the follow
>> error message:
>>
>>
>> A requested component was not found, or was
>> unable to be
>> opened.  This
>> m

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Gilles Gouaillardet

Allan,


i just noted smd has a Mellanox card, while other nodes have QLogic cards.

mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best 
for Mellanox,


but these are not interoperable. also, i do not think btl/openib can be 
used with QLogic cards


(please someone correct me if i am wrong)


from the logs, i can see that smd (Mellanox) is not even able to use the 
infiniband port.


if you run with 2 MPI tasks, both run on smd and hence btl/vader is 
used, that is why it works


if you run with more than 2 MPI tasks, then smd and other nodes are 
used, and every MPI task fall back to btl/tcp


for inter node communication.

[smd][[41971,1],1][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect] 
connect() to 192.168.1.196 failed: No route to host (113)


this usually indicates a firewall, but since both ssh and oob/tcp are 
fine, this puzzles me.



what if you

mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca btl_tcp_if_include 192.168.1.0/24 --mca pml ob1 --mca btl 
tcp,sm,vader,self  ring


that should work with no error messages, and then you can try with 12 
MPI tasks


(note internode MPI communications will use tcp only)


if you want optimal performance, i am afraid you cannot run any MPI task 
on smd (so mtl/psm can be used )


(btw, make sure PSM support was built in Open MPI)

a suboptimal option is to force MPI communications on IPoIB with

/* make sure all nodes can ping each other via IPoIB first */

mpirun --mca oob_tcp_if_include 192.168.1.0/24 --mca btl_tcp_if_include 
10.1.0.0/24 --mca pml ob1 --mca btl tcp,sm,vader,self




Cheers,


Gilles


On 5/19/2017 3:50 PM, Allan Overstreet wrote:

Gilles,

On which node is mpirun invoked ?

The mpirun command was involed on node smd.

Are you running from a batch manager?

No.

Is there any firewall running on your nodes ?

No CentOS minimal does not have a firewall installed and Ubuntu 
Mate's firewall is disabled.


All three of your commands have appeared to run successfully. The 
outputs of the three commands are attached.


mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca oob_base_verbose 100 true &> cmd1


mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca oob_base_verbose 100 true &> cmd2


mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca oob_base_verbose 100 ring &> cmd3


If I increase the number of processors in the ring program, mpirun 
will not succeed.


mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca oob_base_verbose 100 ring &> cmd4



On 05/19/2017 02:18 AM, Gilles Gouaillardet wrote:

Allan,


- on which node is mpirun invoked ?

- are you running from a batch manager ?

- is there any firewall running on your nodes ?


the error is likely occuring when wiring-up mpirun/orted

what if you

mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca oob_base_verbose 100 true


then (if the previous command worked)

mpirun -np 12 --hostfile nodes --mca oob_tcp_if_include 
192.168.1.0/24 --mca oob_base_verbose 100 true


and finally (if both previous commands worked)

mpirun -np 2 --hostfile nodes --mca oob_tcp_if_include 192.168.1.0/24 
--mca oob_base_verbose 100 ring



Cheers,

Gilles

On 5/19/2017 3:07 PM, Allan Overstreet wrote:
I experiencing many different errors with openmpi version 2.1.1. I 
have had a suspicion that this might be related to the way the 
servers were connected and configured. Regardless below is a diagram 
of how the server are configured.


__  _
   [__]|=|
   /::/|_|
   HOST: smd
   Dual 1Gb Ethernet Bonded
   .-> Bond0 IP: 192.168.1.200
   |   Infiniband Card: MHQH29B-XTR <.
   |   Ib0 IP: 10.1.0.1  |
   |   OS: Ubuntu Mate   |
   |   __ _ |
   | [__]|=||
   | /::/|_||
   |   HOST: sm1 |
   |   Dual 1Gb Ethernet Bonded  |
   |-> Bond0 IP: 192.168.1.196   |
   |   Infiniband Card: QLOGIC QLE7340 <-|
   |   Ib0 IP: 10.1.0.2  |
   |   OS: Centos 7 Minimal  |
   |   __ _ |
   | [__]|=||
   |-. /::/|_||
   | | HOST: sm2 |
   | | Dual 1Gb Ethernet Bonded  |

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gilles Gouaillardet

Gabriele,


so it seems pml/pami assumes there is an infiniband card available (!)

i guess IBM folks will comment on that shortly.


meanwhile, you do not need pami since you are running on a single node

mpirun --mca pml ^pami ...

should do the trick

(if it does not work, can run and post the logs)

mpirun --mca pml ^pami --mca pml_base_verbose 100 ...


Cheers,


Gilles


On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:

Hi John,
Infiniband is not used, there is a single node on this machine.

2017-05-19 8:50 GMT+02:00 John Hearns via users 
mailto:users@lists.open-mpi.org>>:


Gabriele,   pleae run  'ibv_devinfo'
It looks to me like you may have the physical interface cards in
these systems, but you do not have the correct drivers or
libraries loaded.

I have had similar messages when using Infiniband on x86 systems -
which did not have libibverbs installed.


On 19 May 2017 at 08:41, Gabriele Fatigati mailto:g.fatig...@cineca.it>> wrote:

Hi Gilles, using your command:

[openpower:88536] mca: base: components_register: registering
framework pml components
[openpower:88536] mca: base: components_register: found loaded
component pami
[openpower:88536] mca: base: components_register: component
pami register function successful
[openpower:88536] mca: base: components_open: opening pml
components
[openpower:88536] mca: base: components_open: found loaded
component pami
[openpower:88536] mca: base: components_open: component pami
open function successful
[openpower:88536] select: initializing pml component pami
findActiveDevices Error
We found no active IB device ports
[openpower:88536] select: init returned failure for component pami
[openpower:88536] PML pami cannot be selected

--
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed componnets can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:  openpower
  Framework: pml

--


2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
mailto:gil...@rist.or.jp>>:

Gabriele,


pml/pami is here, at least according to ompi_info


can you update your mpirun command like this

mpirun --mca pml_base_verbose 100 ..


and post the output ?


Cheers,

Gilles

On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:

Hi Gilles, attached the requested info

2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
mailto:gilles.gouaillar...@gmail.com>
>>:

Gabriele,

can you
ompi_info --all | grep pml

also, make sure there is nothing in your
environment pointing to
an other Open MPI install
for example
ldd a.out
should only point to IBM libraries

Cheers,

Gilles


On Thursday, May 18, 2017, Gabriele Fatigati
mailto:g.fatig...@cineca.it>
>> wrote:

Dear OpenMPI users and developers, I'm using
IBM Spectrum MPI
10.1.0 based on OpenMPI, so I hope there are
some MPI expert
can help me to solve the problem.

When I run a simple Hello World MPI program, I
get the follow
error message:


A requested component was not found, or was
unable to be
opened.  This
means that this component is either not
installed or is unable
to be
used on your system (e.g., sometimes this
means that shared
libraries
that the component requires are unable to be
found/loaded). Note that
Open MPI stopped checking at the first
component that it did
not find.

Host:  openpower
Framework: pml
Component: pami

--

[OMPI users] Fwd: IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
-- Forwarded message --
From: Gabriele Fatigati 
Date: 2017-05-19 9:07 GMT+02:00
Subject: Re: [OMPI users] IBM Spectrum MPI problem
To: John Hearns 


If I understand well, when I launch mpirun by default try to use
Infiniband, but because there are no infiniband module the run fails?


-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722 <051%20617%201722>

g.fatigati [AT] cineca.it



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Gabriele Fatigati
Hi John,
Infiniband is not used, there is a single node on this machine.

2017-05-19 8:50 GMT+02:00 John Hearns via users :

> Gabriele,   pleae run  'ibv_devinfo'
> It looks to me like you may have the physical interface cards in these
> systems, but you do not have the correct drivers or libraries loaded.
>
> I have had similar messages when using Infiniband on x86 systems - which
> did not have libibverbs installed.
>
>
> On 19 May 2017 at 08:41, Gabriele Fatigati  wrote:
>
>> Hi Gilles, using your command:
>>
>> [openpower:88536] mca: base: components_register: registering framework
>> pml components
>> [openpower:88536] mca: base: components_register: found loaded component
>> pami
>> [openpower:88536] mca: base: components_register: component pami register
>> function successful
>> [openpower:88536] mca: base: components_open: opening pml components
>> [openpower:88536] mca: base: components_open: found loaded component pami
>> [openpower:88536] mca: base: components_open: component pami open
>> function successful
>> [openpower:88536] select: initializing pml component pami
>> findActiveDevices Error
>> We found no active IB device ports
>> [openpower:88536] select: init returned failure for component pami
>> [openpower:88536] PML pami cannot be selected
>> 
>> --
>> No components were able to be opened in the pml framework.
>>
>> This typically means that either no components of this type were
>> installed, or none of the installed componnets can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>>
>>   Host:  openpower
>>   Framework: pml
>> 
>> --
>>
>>
>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet :
>>
>>> Gabriele,
>>>
>>>
>>> pml/pami is here, at least according to ompi_info
>>>
>>>
>>> can you update your mpirun command like this
>>>
>>> mpirun --mca pml_base_verbose 100 ..
>>>
>>>
>>> and post the output ?
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
>>>
 Hi Gilles, attached the requested info

 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet <
 gilles.gouaillar...@gmail.com >:

 Gabriele,

 can you
 ompi_info --all | grep pml

 also, make sure there is nothing in your environment pointing to
 an other Open MPI install
 for example
 ldd a.out
 should only point to IBM libraries

 Cheers,

 Gilles


 On Thursday, May 18, 2017, Gabriele Fatigati >>> > wrote:

 Dear OpenMPI users and developers, I'm using IBM Spectrum MPI
 10.1.0 based on OpenMPI, so I hope there are some MPI expert
 can help me to solve the problem.

 When I run a simple Hello World MPI program, I get the follow
 error message:


 A requested component was not found, or was unable to be
 opened.  This
 means that this component is either not installed or is unable
 to be
 used on your system (e.g., sometimes this means that shared
 libraries
 that the component requires are unable to be found/loaded).
  Note that
 Open MPI stopped checking at the first component that it did
 not find.

 Host:  openpower
 Framework: pml
 Component: pami
 
 --
 
 --
 It looks like MPI_INIT failed for some reason; your parallel
 process is
 likely to abort. There are many reasons that a parallel
 process can
 fail during MPI_INIT; some of which are due to configuration
 or environment
 problems.  This failure appears to be an internal failure;
 here's some
 additional information (which may only be relevant to an Open
 MPI
 developer):

 mca_pml_base_open() failed
   --> Returned "Not found" (-13) instead of "Success" (0)
 
 --
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
 now abort,
 ***and potentially your MPI job)

 My sysadmin used official IBM Spectrum packages to install
 MPI, so It's quite strange that there are some components
 missing (pami). Any help? Than