Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen

>> You sound like our vendors, "what is your app"  
> 
> ;-) I used to be one.
> 
> Ideally OMPI should do the switch between MXM/RC/XRC internally in the 
> transport layer. Unfortunately,
> we don't have such smart selection logic. Hopefully IB vendors will fix some 
> day. 

I actually looked in the openib-hca.ini (working from memory) to try and find 
what the default queues were, and I actually couldn't figure it out. The 
ConnectX entry doesn't have a default, and the 'default default'  also doesn't 
have an entry. 

I need to dig into ompi_info, got distracted by an intel compiler bug, ADD for 
admin/user support folks.

> 
>> 
>> Note most of our users run just fine with the standard Peer-Peer queues, 
>> default out the box OpenMPI.
> 
> The P2P queue is fine, but most like using XRC your users will observe better 
> performance. This is not just scalability.

Cool thanks for all the input, I wonder why peer-to-peer is the default, I know 
XRC requires hardware support, 

> 
> - Pasha
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Shamis, Pavel
> 
> You sound like our vendors, "what is your app"  

;-) I used to be one.

Ideally OMPI should do the switch between MXM/RC/XRC internally in the 
transport layer. Unfortunately,
we don't have such smart selection logic. Hopefully IB vendors will fix some 
day. 

> 
> Note most of our users run just fine with the standard Peer-Peer queues, 
> default out the box OpenMPI.

The P2P queue is fine, but most like using XRC your users will observe better 
performance. This is not just scalability.

- Pasha




Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen

>> Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states 
>> that MXM was used in the past only for >128 ranks, but is in 1.6 used for 
>> rank counts of any size.
> 
> 
> This is reasonable threshold if you use openib btl with RC (default). Since 
> XRC provides better scalability, you may move the threshold up. Bottom line 
> you have to experiment and
> see what is good for you :)

You sound like our vendors, "what is your app"  we are a generic HPC provider 
on campus so we don't have a standard workload, unless "everything" is a 
workload.

We will do some testing, we are setting up a time to talk to our Mellonox SA to 
try to understand these components better.

Note most of our users run just fine with the standard Peer-Peer queues, 
default out the box OpenMPI.

> 
> -Pasha
> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote:
>> 
 We just learned about MXM, and given most our cards are Mellonox ConnectX 
 cards (though not all, have have islands of previous to ConnectX and 
 Qlogic supported in the same OpenMPI environment),
 
 Will MXM correctly fail though to PSM if on qlogic gear and fail though to 
 OpenIB if on previous to connectX cards?
>>> 
>>> Do you want to run MXM and PSM in the same MPI session ? You can't do it. 
>>> MXM and PSM use different network protocols.
>>> If you want to use MXM in your MPI job, all nodes should be configured to 
>>> use MXM.
>>> 
>>> On the other hand, OpenIB btl should support mixed environments out of the 
>>> box.
>>> 
>>> - Pasha
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Shamis, Pavel
> 
> Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states 
> that MXM was used in the past only for >128 ranks, but is in 1.6 used for 
> rank counts of any size.


This is reasonable threshold if you use openib btl with RC (default). Since XRC 
provides better scalability, you may move the threshold up. Bottom line you 
have to experiment and see what is good for you :)

-Pasha

> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote:
> 
>>> We just learned about MXM, and given most our cards are Mellonox ConnectX 
>>> cards (though not all, have have islands of previous to ConnectX and Qlogic 
>>> supported in the same OpenMPI environment),
>>> 
>>> Will MXM correctly fail though to PSM if on qlogic gear and fail though to 
>>> OpenIB if on previous to connectX cards?
>> 
>> Do you want to run MXM and PSM in the same MPI session ? You can't do it. 
>> MXM and PSM use different network protocols.
>> If you want to use MXM in your MPI job, all nodes should be configured to 
>> use MXM.
>> 
>> On the other hand, OpenIB btl should support mixed environments out of the 
>> box.
>> 
>> - Pasha
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Brock Palen
On Jan 22, 2013, at 2:53 PM, Shamis, Pavel wrote:

>> 
>> Switching to SRQ and some guess of queue values selected appears to let the 
>> code run.
>> S,4096,128:S,12288,128:S,65536,12 
>> 
>> Two questions,
>> 
>> This is a ConnectX fabric, should I switch them to XRC queues? And should I 
>> use the same queue size/count?  That a safe assumption?
>> X,4096,128:X,12288,128:X,65536,12 
> 
> Yeah, I would use the same values as a starting point. 

Thanks, the users full resolution job got further with shared queues, we are 
going to do a test with XRC queues of the same count. But he keeps getting 
OpenMPI out of memory/reg fail messages. 

> 
>> 
>> 
>> When should I use one queue type over the other?
> 
> Generally speaking XRC transport has much better scalability that RC. 

Ok so if we are useing shared queues on ConnectX gear default to XRC, will do.

> 
> 
>> 
>> Is there a way to get stat feedback on the use of your shared queues (SRQ or 
>> XRC) ?
>> 
>> Example, using code 'not from here' and would like to know,   "hey you are 
>> always  running out of your queue of size X"  Or " your queue of size Y is 
>> never used"
>> 
>> We are kinda blind for a lot of our applications :-)
> 
> Right now we don't have such hooks in openib BTL. 
> It is not very difficult to add some code that will report stat for QP 
> utilization. 
> 
> In you other email you mentioned MXM. I would recommend to try both XRC and 
> MXM and see which one performance better. On relatively small system I would 
> guess
> XRC will perform better, on large system MXM should demonstrate better 
> performance. But again, it all depends on your application.
> 
> - Pasha
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
No there would be no overlap.

We run a large legacy condo, with several islands of Infiniband of different 
ages and types. Users run within their condo/ib island.  So PSM users only run 
on PSM nodes they own, and there is no overlap.

Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states 
that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank 
counts of any size.

I think we will do some testing, we never even heard of MXM before, 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote:

>> We just learned about MXM, and given most our cards are Mellonox ConnectX 
>> cards (though not all, have have islands of previous to ConnectX and Qlogic 
>> supported in the same OpenMPI environment),
>> 
>> Will MXM correctly fail though to PSM if on qlogic gear and fail though to 
>> OpenIB if on previous to connectX cards?
> 
> Do you want to run MXM and PSM in the same MPI session ? You can't do it. MXM 
> and PSM use different network protocols.
> If you want to use MXM in your MPI job, all nodes should be configured to use 
> MXM.
> 
> On the other hand, OpenIB btl should support mixed environments out of the 
> box.
> 
> - Pasha
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] MPI_THREAD_FUNNELED and enable-mpi-thread-multiple

2013-01-22 Thread Roland Schulz
Hi,

compiling 1.6.1 or 1.6.2 without enable-mpi-thread-multiple returns from
MPI_Init_thread as provided level MPI_THREAD_SINGLE. Is
enable-mpi-thread-multiple
required even for MPI_THREAD_FUNNELED/MPI_THREAD_SERIALIZED?

This question has been asked before:
http://www.open-mpi.org/community/lists/users/2011/05/16451.php but I
couldn't find an answer.

Roland

-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309


Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ada Mancuso
Thanks a lot I will try it.
Il giorno 22/gen/2013 21:49, "Ralph Castain"  ha scritto:

> Ouch - no, you'd have to take it from the developer's trunk, either via
> svn checkout or the nightly developer's snapshot
>
> On Jan 22, 2013, at 12:35 PM, Ada Mancuso  wrote:
>
> My problem is that I have to use openmpi 1.7 rc5 because I'm using the
> Java binding mpijava... Is it present in the latest snapshot you told me?
> If so where can I find it?
> Thanks a lot
> Ada
> Il giorno 22/gen/2013 21:03, "Ralph Castain"  ha
> scritto:
>
>> It seems to be working fine for me with the latest 1.7 tarball (not rc5 -
>> I didn't test that one). Could be there was a problem that has since been
>> fixed. We are getting ready to release an updated rc, so you might want to
>> try it (or use the latest nightly 1.7 snapshot).
>>
>>
>> On Jan 22, 2013, at 9:57 AM, Ada Mancuso  wrote:
>>
>> Hi,
>> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines
>> using the command:
>> mpirun -np4 -hostfile file a.out
>> but i get the following message errors:
>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
>> contact information is unknown in file
>> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
>> attempted to send to [[21341,0],2]: tag 15
>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
>> contact information is unknown in file
>> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
>> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh
>> keys among the machines and ssh login works without requiring
>> authentication password. Surprisingly if I try to run my program with at
>> most 2 hosts, and so the file hosts contains only two hosts, it works but
>> if i try to run my program with more than two hosts i have this error; mpi
>> works well on each machine and I also tried to run my program with
>> different couple of machines in order to be sure that no machine could be
>> the problem.
>> Can you help me please?
>> Ada
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ralph Castain
Ouch - no, you'd have to take it from the developer's trunk, either via svn 
checkout or the nightly developer's snapshot

On Jan 22, 2013, at 12:35 PM, Ada Mancuso  wrote:

> My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java 
> binding mpijava... Is it present in the latest snapshot you told me? If so 
> where can I find it?
> Thanks a lot
> Ada
> 
> Il giorno 22/gen/2013 21:03, "Ralph Castain"  ha scritto:
> It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I 
> didn't test that one). Could be there was a problem that has since been 
> fixed. We are getting ready to release an updated rc, so you might want to 
> try it (or use the latest nightly 1.7 snapshot).
> 
> 
> On Jan 22, 2013, at 9:57 AM, Ada Mancuso  wrote:
> 
>> Hi,
>> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using 
>> the command:
>> mpirun -np4 -hostfile file a.out
>> but i get the following message errors:
>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose 
>> contact information is unknown in file 
>> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
>> attempted to send to [[21341,0],2]: tag 15 
>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose 
>> contact information is unknown in file 
>> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
>> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh 
>> keys among the machines and ssh login works without requiring authentication 
>> password. Surprisingly if I try to run my program with at most 2 hosts, and 
>> so the file hosts contains only two hosts, it works but if i try to run my 
>> program with more than two hosts i have this error; mpi works well on each 
>> machine and I also tried to run my program with different couple of machines 
>> in order to be sure that no machine could be the problem.
>> Can you help me please?
>> Ada
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ada Mancuso
My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java
binding mpijava... Is it present in the latest snapshot you told me? If so
where can I find it?
Thanks a lot
Ada
Il giorno 22/gen/2013 21:03, "Ralph Castain"  ha scritto:

> It seems to be working fine for me with the latest 1.7 tarball (not rc5 -
> I didn't test that one). Could be there was a problem that has since been
> fixed. We are getting ready to release an updated rc, so you might want to
> try it (or use the latest nightly 1.7 snapshot).
>
>
> On Jan 22, 2013, at 9:57 AM, Ada Mancuso  wrote:
>
> Hi,
> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines
> using the command:
> mpirun -np4 -hostfile file a.out
> but i get the following message errors:
> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
> contact information is unknown in file
> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
> attempted to send to [[21341,0],2]: tag 15
> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
> contact information is unknown in file
> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh
> keys among the machines and ssh login works without requiring
> authentication password. Surprisingly if I try to run my program with at
> most 2 hosts, and so the file hosts contains only two hosts, it works but
> if i try to run my program with more than two hosts i have this error; mpi
> works well on each machine and I also tried to run my program with
> different couple of machines in order to be sure that no machine could be
> the problem.
> Can you help me please?
> Ada
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] help me understand these error msgs

2013-01-22 Thread Ralph Castain
I see - then the problem is that at least one node is unable to communicate via 
TCP back to where mpirun is executing. Might be a firewall, or could be that we 
are selecting the wrong network if multiple NICs are around. I assume that you 
use additional nodes when running against the larger dataset?


On Jan 22, 2013, at 9:34 AM, Jure Pečar  wrote:

> On Thu, 17 Jan 2013 11:54:13 -0800
> Ralph Castain  wrote:
> 
>> Or is this happening on startup of the larger job, or during a call to 
>> MPI_Comm_spawn?
> 
> This happens on a startup. Mpirun spawns processes and when they start 
> talking to eachother during setup phase, I get this kind of error. Running 
> time in such case is less than a minute.
> 
> 
> -- 
> 
> Jure Pečar
> http://jure.pecar.org
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ralph Castain
It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I 
didn't test that one). Could be there was a problem that has since been fixed. 
We are getting ready to release an updated rc, so you might want to try it (or 
use the latest nightly 1.7 snapshot).


On Jan 22, 2013, at 9:57 AM, Ada Mancuso  wrote:

> Hi,
> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using 
> the command:
> mpirun -np4 -hostfile file a.out
> but i get the following message errors:
> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact 
> information is unknown in file 
> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
> attempted to send to [[21341,0],2]: tag 15 
> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact 
> information is unknown in file 
> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh 
> keys among the machines and ssh login works without requiring authentication 
> password. Surprisingly if I try to run my program with at most 2 hosts, and 
> so the file hosts contains only two hosts, it works but if i try to run my 
> program with more than two hosts i have this error; mpi works well on each 
> machine and I also tried to run my program with different couple of machines 
> in order to be sure that no machine could be the problem.
> Can you help me please?
> Ada
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Shamis, Pavel
> We just learned about MXM, and given most our cards are Mellonox ConnectX 
> cards (though not all, have have islands of previous to ConnectX and Qlogic 
> supported in the same OpenMPI environment),
> 
> Will MXM correctly fail though to PSM if on qlogic gear and fail though to 
> OpenIB if on previous to connectX cards?

Do you want to run MXM and PSM in the same MPI session ? You can't do it. MXM 
and PSM use different network protocols.
If you want to use MXM in your MPI job, all nodes should be configured to use 
MXM.

On the other hand, OpenIB btl should support mixed environments out of the box.

- Pasha


Re: [OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Shamis, Pavel
> 
> Switching to SRQ and some guess of queue values selected appears to let the 
> code run.
> S,4096,128:S,12288,128:S,65536,12 
> 
> Two questions,
> 
> This is a ConnectX fabric, should I switch them to XRC queues? And should I 
> use the same queue size/count?  That a safe assumption?
> X,4096,128:X,12288,128:X,65536,12 

Yeah, I would use the same values as a starting point. 

> 
> 
>  When should I use one queue type over the other?

Generally speaking XRC transport has much better scalability that RC. 


> 
> Is there a way to get stat feedback on the use of your shared queues (SRQ or 
> XRC) ?
> 
> Example, using code 'not from here' and would like to know,   "hey you are 
> always  running out of your queue of size X"  Or " your queue of size Y is 
> never used"
> 
> We are kinda blind for a lot of our applications :-)

Right now we don't have such hooks in openib BTL. 
It is not very difficult to add some code that will report stat for QP 
utilization. 

In you other email you mentioned MXM. I would recommend to try both XRC and MXM 
and see which one performance better. On relatively small system I would guess
XRC will perform better, on large system MXM should demonstrate better 
performance. But again, it all depends on your application.

- Pasha




[OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ada Mancuso
Hi,
I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using
the command:
mpirun -np4 -hostfile file a.out
but i get the following message errors:
ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
contact information is unknown in file
../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
attempted to send to [[21341,0],2]: tag 15
ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
contact information is unknown in file
../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
The file etc/hosts is composed by ipaddress hostname, I have exchange ssh
keys among the machines and ssh login works without requiring
authentication password. Surprisingly if I try to run my program with at
most 2 hosts, and so the file hosts contains only two hosts, it works but
if i try to run my program with more than two hosts i have this error; mpi
works well on each machine and I also tried to run my program with
different couple of machines in order to be sure that no machine could be
the problem.
Can you help me please?
Ada


Re: [OMPI users] help me understand these error msgs

2013-01-22 Thread Jure Pečar
On Thu, 17 Jan 2013 11:54:13 -0800
Ralph Castain  wrote:

> Or is this happening on startup of the larger job, or during a call to 
> MPI_Comm_spawn?

This happens on a startup. Mpirun spawns processes and when they start talking 
to eachother during setup phase, I get this kind of error. Running time in such 
case is less than a minute.


-- 

Jure Pečar
http://jure.pecar.org



Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi

2013-01-22 Thread Jeff Squyres (jsquyres)
On Jan 19, 2013, at 1:05 PM, Lee Eric  wrote:

> However, I hit another issue about fortran as configure running.
> 
> *** Fortran 90/95 compiler
> checking for armv6-rpi-linux-gnueabi-gfortran...
> armv6-rpi-linux-gnueabi-gfortran
> checking whether we are using the GNU Fortran compiler... yes
> checking whether armv6-rpi-linux-gnueabi-gfortran accepts -g... yes
> checking if Fortran 77 compiler works... links (cross compiling)
> checking armv6-rpi-linux-gnueabi-gfortran external symbol
> convention... single underscore
> checking if C and Fortran 77 are link compatible... yes
> checking to see if F77 compiler likes the C++ exception flags...
> skipped (no C++ exceptions flags)
> checking to see if mpif77/mpif90 compilers need additional linker flags... 
> none
> checking if Fortran 77 compiler supports CHARACTER... yes
> checking size of Fortran 77 CHARACTER... configure: error: Can not
> determine size of CHARACTER when cross-compiling


Just to follow up on this point -- cross compiling with Open MPI is a known 
issue.

The specific problem you're running in to here is that configure is trying to 
compile *and run* some Fortran tests.  Which obviously doesn't work in a 
cross-compiling environment.

You can work around this, however, either by disabling Fortran (which you did), 
or you can pre-populate configure's answers to the Fortran tests (so that it 
doesn't actually have to run anything).  However, we have never fully 
documented the procedure on how to do this (it's not straightforward, and 
definitely not for the weak of heart).  

If you don't need Fortran, simply disabling Fortran is probably your best bet.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi

2013-01-22 Thread Jeff Squyres (jsquyres)
Note that the original author of the ARM support chimed in on this on the devel 
list:

http://www.open-mpi.org/community/lists/devel/2013/01/11955.php


On Jan 21, 2013, at 6:50 AM, George Bosilca 
 wrote:

> Great, I pushed everything upstream:
> - trunk (r27882)
> - prepared a patch for the 1.6 
> (https://svn.open-mpi.org/trac/ompi/ticket/3469)
> - requested a CMR for the 1.7 (https://svn.open-mpi.org/trac/ompi/ticket/3470)
> 
> Thanks for your help,
>  George.
> 
> 
> On Jan 21, 2013, at 07:56 , Lee Eric  wrote:
> 
>> Thank you mate. This patch works quite well on my Raspberry Pi w/o any
>> error. Can we put them in the upstream?
>> 
>> Thanks.
>> 
>> Eric
>> 
>> On Mon, Jan 21, 2013 at 12:07 AM, George Bosilca  wrote:
>>> Eric,
>>> 
>>> What do you think about the patch attached to ticket #3469 
>>> (https://svn.open-mpi.org/trac/ompi/ticket/3469). We might blend the two 
>>> patches together, and have all the different ARM versions covered.
>>> 
>>> Thanks,
>>>   George.
>>> 
>>> On Jan 20, 2013, at 05:55 , Lee Eric  wrote:
>>> 
 Hi,
 
 The above issue fixed w/ this patch I used:
 https://raw.github.com/sebhtml/patches/master/openmpi/Raspberry-Pi-openmpi-1.6.2.patch
 
 Is that possible OpenMPI can contain this patch in the future?
 
 Thanks.
 
 On Sun, Jan 20, 2013 at 3:13 AM, Lee Eric  
 wrote:
> Hi,
> 
> I just use --disable-mpif77 and --disable-mpif90 to let configure run
> well. However, I know it's only tough workround. After configured
> well, I encounter following error when run make:
> 
> Making all in config
> make[1]: Entering directory `/home/huli/Projects/openmpi-1.6.3/config'
> make[1]: Nothing to be done for `all'.
> make[1]: Leaving directory `/home/huli/Projects/openmpi-1.6.3/config'
> Making all in contrib
> make[1]: Entering directory `/home/huli/Projects/openmpi-1.6.3/contrib'
> make[1]: Nothing to be done for `all'.
> make[1]: Leaving directory `/home/huli/Projects/openmpi-1.6.3/contrib'
> Making all in opal
> make[1]: Entering directory `/home/huli/Projects/openmpi-1.6.3/opal'
> Making all in include
> make[2]: Entering directory 
> `/home/huli/Projects/openmpi-1.6.3/opal/include'
> make  all-am
> make[3]: Entering directory 
> `/home/huli/Projects/openmpi-1.6.3/opal/include'
> make[3]: Leaving directory 
> `/home/huli/Projects/openmpi-1.6.3/opal/include'
> make[2]: Leaving directory 
> `/home/huli/Projects/openmpi-1.6.3/opal/include'
> Making all in libltdl
> make[2]: Entering directory 
> `/home/huli/Projects/openmpi-1.6.3/opal/libltdl'
> make  all-am
> make[3]: Entering directory 
> `/home/huli/Projects/openmpi-1.6.3/opal/libltdl'
> /bin/sh ./libtool  --tag=CC   --mode=compile
> armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I.
> -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl
> -I./libltdl 
> -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include
> -I/usr/include/infiniband -I/usr/include/infiniband   -Ofast
> -mfpu=vfp -mfloat-abi=hard -MT dlopen.lo -MD -MP -MF .deps/dlopen.Tpo
> -c -o dlopen.lo `test -f 'loaders/dlopen.c' || echo
> './'`loaders/dlopen.c
> /bin/sh ./libtool  --tag=CC   --mode=compile
> armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I.  -DLTDLOPEN=libltdlc
> -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl
> -I./libltdl 
> -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include
> -I/usr/include/infiniband -I/usr/include/infiniband   -Ofast
> -mfpu=vfp -mfloat-abi=hard -MT libltdlc_la-preopen.lo -MD -MP -MF
> .deps/libltdlc_la-preopen.Tpo -c -o libltdlc_la-preopen.lo `test -f
> 'loaders/preopen.c' || echo './'`loaders/preopen.c
> /bin/sh ./libtool  --tag=CC   --mode=compile
> armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I.  -DLTDLOPEN=libltdlc
> -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl
> -I./libltdl 
> -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include
> -I/usr/include/infiniband -I/usr/include/infiniband   -Ofast
> -mfpu=vfp -mfloat-abi=hard -MT libltdlc_la-lt__alloc.lo -MD -MP -MF
> .deps/libltdlc_la-lt__alloc.Tpo -c -o libltdlc_la-lt__alloc.lo `test
> -f 'lt__alloc.c' || echo './'`lt__alloc.c
> /bin/sh ./libtool  --tag=CC   --mode=compile
> armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I.  -DLTDLOPEN=libltdlc
> -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl
> -I./libltdl 
> -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include
> -I/usr/include/infiniband -I/usr/include/infiniband   -Ofast
> -mfpu=vfp -mfloat-abi=hard -MT libltdlc_la-lt_dlloader.lo -MD -MP -MF
> .deps/libltdlc_la-lt_dlloader.Tpo -c -o libltdlc_la-lt_dlloader.lo
> `test -f 'lt_dlloader.c' || echo './'`lt_dlloader.c
> /bin/sh ./libtool  --tag=CC   --mode=compile
> ar

[OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
We just learned about MXM, and given most our cards are Mellonox ConnectX cards 
(though not all, have have islands of previous to ConnectX and Qlogic supported 
in the same OpenMPI environment),

Will MXM correctly fail though to PSM if on qlogic gear and fail though to 
OpenIB if on previous to connectX cards?

Lastly looking at the faq looks like MXM is used by default if available over 
openIB

Should I take that to mean "use MXM if available and supported" ?  As in only 
use openib if that is the only thing you have?

Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985






[OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Brock Palen
We hit a problem recently with memory errors when scaling a code to 1000 cores. 

Switching to SRQ and some guess of queue values selected appears to let the 
code run.
S,4096,128:S,12288,128:S,65536,12 

Two questions,

This is a ConnectX fabric, should I switch them to XRC queues? And should I use 
the same queue size/count?  That a safe assumption?
X,4096,128:X,12288,128:X,65536,12 


  When should I use one queue type over the other?

Is there a way to get stat feedback on the use of your shared queues (SRQ or 
XRC) ?

Example, using code 'not from here' and would like to know,   "hey you are 
always  running out of your queue of size X"  Or " your queue of size Y is 
never used"

We are kinda blind for a lot of our applications :-)

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985






Re: [OMPI users] [EXTERNAL] Possible memory leak(s) in OpenMPI 1.6.3?

2013-01-22 Thread Victor Vysotskiy
Dear Brian,

thank you very much for your assistance and for the bug fix.

Regards,
Victor.