Re: [OMPI users] MXM vs OpenIB
>> You sound like our vendors, "what is your app" > > ;-) I used to be one. > > Ideally OMPI should do the switch between MXM/RC/XRC internally in the > transport layer. Unfortunately, > we don't have such smart selection logic. Hopefully IB vendors will fix some > day. I actually looked in the openib-hca.ini (working from memory) to try and find what the default queues were, and I actually couldn't figure it out. The ConnectX entry doesn't have a default, and the 'default default' also doesn't have an entry. I need to dig into ompi_info, got distracted by an intel compiler bug, ADD for admin/user support folks. > >> >> Note most of our users run just fine with the standard Peer-Peer queues, >> default out the box OpenMPI. > > The P2P queue is fine, but most like using XRC your users will observe better > performance. This is not just scalability. Cool thanks for all the input, I wonder why peer-to-peer is the default, I know XRC requires hardware support, > > - Pasha > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MXM vs OpenIB
> > You sound like our vendors, "what is your app" ;-) I used to be one. Ideally OMPI should do the switch between MXM/RC/XRC internally in the transport layer. Unfortunately, we don't have such smart selection logic. Hopefully IB vendors will fix some day. > > Note most of our users run just fine with the standard Peer-Peer queues, > default out the box OpenMPI. The P2P queue is fine, but most like using XRC your users will observe better performance. This is not just scalability. - Pasha
Re: [OMPI users] MXM vs OpenIB
>> Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states >> that MXM was used in the past only for >128 ranks, but is in 1.6 used for >> rank counts of any size. > > > This is reasonable threshold if you use openib btl with RC (default). Since > XRC provides better scalability, you may move the threshold up. Bottom line > you have to experiment and > see what is good for you :) You sound like our vendors, "what is your app" we are a generic HPC provider on campus so we don't have a standard workload, unless "everything" is a workload. We will do some testing, we are setting up a time to talk to our Mellonox SA to try to understand these components better. Note most of our users run just fine with the standard Peer-Peer queues, default out the box OpenMPI. > > -Pasha > >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> bro...@umich.edu >> (734)936-1985 >> >> >> >> On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote: >> We just learned about MXM, and given most our cards are Mellonox ConnectX cards (though not all, have have islands of previous to ConnectX and Qlogic supported in the same OpenMPI environment), Will MXM correctly fail though to PSM if on qlogic gear and fail though to OpenIB if on previous to connectX cards? >>> >>> Do you want to run MXM and PSM in the same MPI session ? You can't do it. >>> MXM and PSM use different network protocols. >>> If you want to use MXM in your MPI job, all nodes should be configured to >>> use MXM. >>> >>> On the other hand, OpenIB btl should support mixed environments out of the >>> box. >>> >>> - Pasha >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MXM vs OpenIB
> > Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states > that MXM was used in the past only for >128 ranks, but is in 1.6 used for > rank counts of any size. This is reasonable threshold if you use openib btl with RC (default). Since XRC provides better scalability, you may move the threshold up. Bottom line you have to experiment and see what is good for you :) -Pasha > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > bro...@umich.edu > (734)936-1985 > > > > On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote: > >>> We just learned about MXM, and given most our cards are Mellonox ConnectX >>> cards (though not all, have have islands of previous to ConnectX and Qlogic >>> supported in the same OpenMPI environment), >>> >>> Will MXM correctly fail though to PSM if on qlogic gear and fail though to >>> OpenIB if on previous to connectX cards? >> >> Do you want to run MXM and PSM in the same MPI session ? You can't do it. >> MXM and PSM use different network protocols. >> If you want to use MXM in your MPI job, all nodes should be configured to >> use MXM. >> >> On the other hand, OpenIB btl should support mixed environments out of the >> box. >> >> - Pasha >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] XRC vs SRQ vs PRQ
On Jan 22, 2013, at 2:53 PM, Shamis, Pavel wrote: >> >> Switching to SRQ and some guess of queue values selected appears to let the >> code run. >> S,4096,128:S,12288,128:S,65536,12 >> >> Two questions, >> >> This is a ConnectX fabric, should I switch them to XRC queues? And should I >> use the same queue size/count? That a safe assumption? >> X,4096,128:X,12288,128:X,65536,12 > > Yeah, I would use the same values as a starting point. Thanks, the users full resolution job got further with shared queues, we are going to do a test with XRC queues of the same count. But he keeps getting OpenMPI out of memory/reg fail messages. > >> >> >> When should I use one queue type over the other? > > Generally speaking XRC transport has much better scalability that RC. Ok so if we are useing shared queues on ConnectX gear default to XRC, will do. > > >> >> Is there a way to get stat feedback on the use of your shared queues (SRQ or >> XRC) ? >> >> Example, using code 'not from here' and would like to know, "hey you are >> always running out of your queue of size X" Or " your queue of size Y is >> never used" >> >> We are kinda blind for a lot of our applications :-) > > Right now we don't have such hooks in openib BTL. > It is not very difficult to add some code that will report stat for QP > utilization. > > In you other email you mentioned MXM. I would recommend to try both XRC and > MXM and see which one performance better. On relatively small system I would > guess > XRC will perform better, on large system MXM should demonstrate better > performance. But again, it all depends on your application. > > - Pasha > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MXM vs OpenIB
No there would be no overlap. We run a large legacy condo, with several islands of Infiniband of different ages and types. Users run within their condo/ib island. So PSM users only run on PSM nodes they own, and there is no overlap. Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank counts of any size. I think we will do some testing, we never even heard of MXM before, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote: >> We just learned about MXM, and given most our cards are Mellonox ConnectX >> cards (though not all, have have islands of previous to ConnectX and Qlogic >> supported in the same OpenMPI environment), >> >> Will MXM correctly fail though to PSM if on qlogic gear and fail though to >> OpenIB if on previous to connectX cards? > > Do you want to run MXM and PSM in the same MPI session ? You can't do it. MXM > and PSM use different network protocols. > If you want to use MXM in your MPI job, all nodes should be configured to use > MXM. > > On the other hand, OpenIB btl should support mixed environments out of the > box. > > - Pasha > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] MPI_THREAD_FUNNELED and enable-mpi-thread-multiple
Hi, compiling 1.6.1 or 1.6.2 without enable-mpi-thread-multiple returns from MPI_Init_thread as provided level MPI_THREAD_SINGLE. Is enable-mpi-thread-multiple required even for MPI_THREAD_FUNNELED/MPI_THREAD_SERIALIZED? This question has been asked before: http://www.open-mpi.org/community/lists/users/2011/05/16451.php but I couldn't find an answer. Roland -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 865-241-1537, ORNL PO BOX 2008 MS6309
Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR
Thanks a lot I will try it. Il giorno 22/gen/2013 21:49, "Ralph Castain" ha scritto: > Ouch - no, you'd have to take it from the developer's trunk, either via > svn checkout or the nightly developer's snapshot > > On Jan 22, 2013, at 12:35 PM, Ada Mancuso wrote: > > My problem is that I have to use openmpi 1.7 rc5 because I'm using the > Java binding mpijava... Is it present in the latest snapshot you told me? > If so where can I find it? > Thanks a lot > Ada > Il giorno 22/gen/2013 21:03, "Ralph Castain" ha > scritto: > >> It seems to be working fine for me with the latest 1.7 tarball (not rc5 - >> I didn't test that one). Could be there was a problem that has since been >> fixed. We are getting ready to release an updated rc, so you might want to >> try it (or use the latest nightly 1.7 snapshot). >> >> >> On Jan 22, 2013, at 9:57 AM, Ada Mancuso wrote: >> >> Hi, >> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines >> using the command: >> mpirun -np4 -hostfile file a.out >> but i get the following message errors: >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >> contact information is unknown in file >> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c >> attempted to send to [[21341,0],2]: tag 15 >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >> contact information is unknown in file >> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c >> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh >> keys among the machines and ssh login works without requiring >> authentication password. Surprisingly if I try to run my program with at >> most 2 hosts, and so the file hosts contains only two hosts, it works but >> if i try to run my program with more than two hosts i have this error; mpi >> works well on each machine and I also tried to run my program with >> different couple of machines in order to be sure that no machine could be >> the problem. >> Can you help me please? >> Ada >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR
Ouch - no, you'd have to take it from the developer's trunk, either via svn checkout or the nightly developer's snapshot On Jan 22, 2013, at 12:35 PM, Ada Mancuso wrote: > My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java > binding mpijava... Is it present in the latest snapshot you told me? If so > where can I find it? > Thanks a lot > Ada > > Il giorno 22/gen/2013 21:03, "Ralph Castain" ha scritto: > It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I > didn't test that one). Could be there was a problem that has since been > fixed. We are getting ready to release an updated rc, so you might want to > try it (or use the latest nightly 1.7 snapshot). > > > On Jan 22, 2013, at 9:57 AM, Ada Mancuso wrote: > >> Hi, >> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using >> the command: >> mpirun -np4 -hostfile file a.out >> but i get the following message errors: >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >> contact information is unknown in file >> ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c >> attempted to send to [[21341,0],2]: tag 15 >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >> contact information is unknown in file >> ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c >> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh >> keys among the machines and ssh login works without requiring authentication >> password. Surprisingly if I try to run my program with at most 2 hosts, and >> so the file hosts contains only two hosts, it works but if i try to run my >> program with more than two hosts i have this error; mpi works well on each >> machine and I also tried to run my program with different couple of machines >> in order to be sure that no machine could be the problem. >> Can you help me please? >> Ada >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR
My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java binding mpijava... Is it present in the latest snapshot you told me? If so where can I find it? Thanks a lot Ada Il giorno 22/gen/2013 21:03, "Ralph Castain" ha scritto: > It seems to be working fine for me with the latest 1.7 tarball (not rc5 - > I didn't test that one). Could be there was a problem that has since been > fixed. We are getting ready to release an updated rc, so you might want to > try it (or use the latest nightly 1.7 snapshot). > > > On Jan 22, 2013, at 9:57 AM, Ada Mancuso wrote: > > Hi, > I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines > using the command: > mpirun -np4 -hostfile file a.out > but i get the following message errors: > ORTE_ERROR_LOG: A message is attempting to be sent to a process whose > contact information is unknown in file > ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c > attempted to send to [[21341,0],2]: tag 15 > ORTE_ERROR_LOG: A message is attempting to be sent to a process whose > contact information is unknown in file > ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c > The file etc/hosts is composed by ipaddress hostname, I have exchange ssh > keys among the machines and ssh login works without requiring > authentication password. Surprisingly if I try to run my program with at > most 2 hosts, and so the file hosts contains only two hosts, it works but > if i try to run my program with more than two hosts i have this error; mpi > works well on each machine and I also tried to run my program with > different couple of machines in order to be sure that no machine could be > the problem. > Can you help me please? > Ada > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] help me understand these error msgs
I see - then the problem is that at least one node is unable to communicate via TCP back to where mpirun is executing. Might be a firewall, or could be that we are selecting the wrong network if multiple NICs are around. I assume that you use additional nodes when running against the larger dataset? On Jan 22, 2013, at 9:34 AM, Jure Pečar wrote: > On Thu, 17 Jan 2013 11:54:13 -0800 > Ralph Castain wrote: > >> Or is this happening on startup of the larger job, or during a call to >> MPI_Comm_spawn? > > This happens on a startup. Mpirun spawns processes and when they start > talking to eachother during setup phase, I get this kind of error. Running > time in such case is less than a minute. > > > -- > > Jure Pečar > http://jure.pecar.org > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR
It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I didn't test that one). Could be there was a problem that has since been fixed. We are getting ready to release an updated rc, so you might want to try it (or use the latest nightly 1.7 snapshot). On Jan 22, 2013, at 9:57 AM, Ada Mancuso wrote: > Hi, > I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using > the command: > mpirun -np4 -hostfile file a.out > but i get the following message errors: > ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact > information is unknown in file > ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c > attempted to send to [[21341,0],2]: tag 15 > ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact > information is unknown in file > ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c > The file etc/hosts is composed by ipaddress hostname, I have exchange ssh > keys among the machines and ssh login works without requiring authentication > password. Surprisingly if I try to run my program with at most 2 hosts, and > so the file hosts contains only two hosts, it works but if i try to run my > program with more than two hosts i have this error; mpi works well on each > machine and I also tried to run my program with different couple of machines > in order to be sure that no machine could be the problem. > Can you help me please? > Ada > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MXM vs OpenIB
> We just learned about MXM, and given most our cards are Mellonox ConnectX > cards (though not all, have have islands of previous to ConnectX and Qlogic > supported in the same OpenMPI environment), > > Will MXM correctly fail though to PSM if on qlogic gear and fail though to > OpenIB if on previous to connectX cards? Do you want to run MXM and PSM in the same MPI session ? You can't do it. MXM and PSM use different network protocols. If you want to use MXM in your MPI job, all nodes should be configured to use MXM. On the other hand, OpenIB btl should support mixed environments out of the box. - Pasha
Re: [OMPI users] XRC vs SRQ vs PRQ
> > Switching to SRQ and some guess of queue values selected appears to let the > code run. > S,4096,128:S,12288,128:S,65536,12 > > Two questions, > > This is a ConnectX fabric, should I switch them to XRC queues? And should I > use the same queue size/count? That a safe assumption? > X,4096,128:X,12288,128:X,65536,12 Yeah, I would use the same values as a starting point. > > > When should I use one queue type over the other? Generally speaking XRC transport has much better scalability that RC. > > Is there a way to get stat feedback on the use of your shared queues (SRQ or > XRC) ? > > Example, using code 'not from here' and would like to know, "hey you are > always running out of your queue of size X" Or " your queue of size Y is > never used" > > We are kinda blind for a lot of our applications :-) Right now we don't have such hooks in openib BTL. It is not very difficult to add some code that will report stat for QP utilization. In you other email you mentioned MXM. I would recommend to try both XRC and MXM and see which one performance better. On relatively small system I would guess XRC will perform better, on large system MXM should demonstrate better performance. But again, it all depends on your application. - Pasha
[OMPI users] OPENMPI_ORTE_LOG_ERROR
Hi, I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using the command: mpirun -np4 -hostfile file a.out but i get the following message errors: ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c attempted to send to [[21341,0],2]: tag 15 ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c The file etc/hosts is composed by ipaddress hostname, I have exchange ssh keys among the machines and ssh login works without requiring authentication password. Surprisingly if I try to run my program with at most 2 hosts, and so the file hosts contains only two hosts, it works but if i try to run my program with more than two hosts i have this error; mpi works well on each machine and I also tried to run my program with different couple of machines in order to be sure that no machine could be the problem. Can you help me please? Ada
Re: [OMPI users] help me understand these error msgs
On Thu, 17 Jan 2013 11:54:13 -0800 Ralph Castain wrote: > Or is this happening on startup of the larger job, or during a call to > MPI_Comm_spawn? This happens on a startup. Mpirun spawns processes and when they start talking to eachother during setup phase, I get this kind of error. Running time in such case is less than a minute. -- Jure Pečar http://jure.pecar.org
Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi
On Jan 19, 2013, at 1:05 PM, Lee Eric wrote: > However, I hit another issue about fortran as configure running. > > *** Fortran 90/95 compiler > checking for armv6-rpi-linux-gnueabi-gfortran... > armv6-rpi-linux-gnueabi-gfortran > checking whether we are using the GNU Fortran compiler... yes > checking whether armv6-rpi-linux-gnueabi-gfortran accepts -g... yes > checking if Fortran 77 compiler works... links (cross compiling) > checking armv6-rpi-linux-gnueabi-gfortran external symbol > convention... single underscore > checking if C and Fortran 77 are link compatible... yes > checking to see if F77 compiler likes the C++ exception flags... > skipped (no C++ exceptions flags) > checking to see if mpif77/mpif90 compilers need additional linker flags... > none > checking if Fortran 77 compiler supports CHARACTER... yes > checking size of Fortran 77 CHARACTER... configure: error: Can not > determine size of CHARACTER when cross-compiling Just to follow up on this point -- cross compiling with Open MPI is a known issue. The specific problem you're running in to here is that configure is trying to compile *and run* some Fortran tests. Which obviously doesn't work in a cross-compiling environment. You can work around this, however, either by disabling Fortran (which you did), or you can pre-populate configure's answers to the Fortran tests (so that it doesn't actually have to run anything). However, we have never fully documented the procedure on how to do this (it's not straightforward, and definitely not for the weak of heart). If you don't need Fortran, simply disabling Fortran is probably your best bet. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi
Note that the original author of the ARM support chimed in on this on the devel list: http://www.open-mpi.org/community/lists/devel/2013/01/11955.php On Jan 21, 2013, at 6:50 AM, George Bosilca wrote: > Great, I pushed everything upstream: > - trunk (r27882) > - prepared a patch for the 1.6 > (https://svn.open-mpi.org/trac/ompi/ticket/3469) > - requested a CMR for the 1.7 (https://svn.open-mpi.org/trac/ompi/ticket/3470) > > Thanks for your help, > George. > > > On Jan 21, 2013, at 07:56 , Lee Eric wrote: > >> Thank you mate. This patch works quite well on my Raspberry Pi w/o any >> error. Can we put them in the upstream? >> >> Thanks. >> >> Eric >> >> On Mon, Jan 21, 2013 at 12:07 AM, George Bosilca wrote: >>> Eric, >>> >>> What do you think about the patch attached to ticket #3469 >>> (https://svn.open-mpi.org/trac/ompi/ticket/3469). We might blend the two >>> patches together, and have all the different ARM versions covered. >>> >>> Thanks, >>> George. >>> >>> On Jan 20, 2013, at 05:55 , Lee Eric wrote: >>> Hi, The above issue fixed w/ this patch I used: https://raw.github.com/sebhtml/patches/master/openmpi/Raspberry-Pi-openmpi-1.6.2.patch Is that possible OpenMPI can contain this patch in the future? Thanks. On Sun, Jan 20, 2013 at 3:13 AM, Lee Eric wrote: > Hi, > > I just use --disable-mpif77 and --disable-mpif90 to let configure run > well. However, I know it's only tough workround. After configured > well, I encounter following error when run make: > > Making all in config > make[1]: Entering directory `/home/huli/Projects/openmpi-1.6.3/config' > make[1]: Nothing to be done for `all'. > make[1]: Leaving directory `/home/huli/Projects/openmpi-1.6.3/config' > Making all in contrib > make[1]: Entering directory `/home/huli/Projects/openmpi-1.6.3/contrib' > make[1]: Nothing to be done for `all'. > make[1]: Leaving directory `/home/huli/Projects/openmpi-1.6.3/contrib' > Making all in opal > make[1]: Entering directory `/home/huli/Projects/openmpi-1.6.3/opal' > Making all in include > make[2]: Entering directory > `/home/huli/Projects/openmpi-1.6.3/opal/include' > make all-am > make[3]: Entering directory > `/home/huli/Projects/openmpi-1.6.3/opal/include' > make[3]: Leaving directory > `/home/huli/Projects/openmpi-1.6.3/opal/include' > make[2]: Leaving directory > `/home/huli/Projects/openmpi-1.6.3/opal/include' > Making all in libltdl > make[2]: Entering directory > `/home/huli/Projects/openmpi-1.6.3/opal/libltdl' > make all-am > make[3]: Entering directory > `/home/huli/Projects/openmpi-1.6.3/opal/libltdl' > /bin/sh ./libtool --tag=CC --mode=compile > armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I. > -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl > -I./libltdl > -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include > -I/usr/include/infiniband -I/usr/include/infiniband -Ofast > -mfpu=vfp -mfloat-abi=hard -MT dlopen.lo -MD -MP -MF .deps/dlopen.Tpo > -c -o dlopen.lo `test -f 'loaders/dlopen.c' || echo > './'`loaders/dlopen.c > /bin/sh ./libtool --tag=CC --mode=compile > armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc > -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl > -I./libltdl > -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include > -I/usr/include/infiniband -I/usr/include/infiniband -Ofast > -mfpu=vfp -mfloat-abi=hard -MT libltdlc_la-preopen.lo -MD -MP -MF > .deps/libltdlc_la-preopen.Tpo -c -o libltdlc_la-preopen.lo `test -f > 'loaders/preopen.c' || echo './'`loaders/preopen.c > /bin/sh ./libtool --tag=CC --mode=compile > armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc > -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl > -I./libltdl > -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include > -I/usr/include/infiniband -I/usr/include/infiniband -Ofast > -mfpu=vfp -mfloat-abi=hard -MT libltdlc_la-lt__alloc.lo -MD -MP -MF > .deps/libltdlc_la-lt__alloc.Tpo -c -o libltdlc_la-lt__alloc.lo `test > -f 'lt__alloc.c' || echo './'`lt__alloc.c > /bin/sh ./libtool --tag=CC --mode=compile > armv6-rpi-linux-gnueabi-gcc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc > -DLT_CONFIG_H='' -DLTDL -I. -I. -Ilibltdl -I./libltdl > -I./libltdl > -I/home/huli/Projects/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include > -I/usr/include/infiniband -I/usr/include/infiniband -Ofast > -mfpu=vfp -mfloat-abi=hard -MT libltdlc_la-lt_dlloader.lo -MD -MP -MF > .deps/libltdlc_la-lt_dlloader.Tpo -c -o libltdlc_la-lt_dlloader.lo > `test -f 'lt_dlloader.c' || echo './'`lt_dlloader.c > /bin/sh ./libtool --tag=CC --mode=compile > ar
[OMPI users] MXM vs OpenIB
We just learned about MXM, and given most our cards are Mellonox ConnectX cards (though not all, have have islands of previous to ConnectX and Qlogic supported in the same OpenMPI environment), Will MXM correctly fail though to PSM if on qlogic gear and fail though to OpenIB if on previous to connectX cards? Lastly looking at the faq looks like MXM is used by default if available over openIB Should I take that to mean "use MXM if available and supported" ? As in only use openib if that is the only thing you have? Thanks! Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985
[OMPI users] XRC vs SRQ vs PRQ
We hit a problem recently with memory errors when scaling a code to 1000 cores. Switching to SRQ and some guess of queue values selected appears to let the code run. S,4096,128:S,12288,128:S,65536,12 Two questions, This is a ConnectX fabric, should I switch them to XRC queues? And should I use the same queue size/count? That a safe assumption? X,4096,128:X,12288,128:X,65536,12 When should I use one queue type over the other? Is there a way to get stat feedback on the use of your shared queues (SRQ or XRC) ? Example, using code 'not from here' and would like to know, "hey you are always running out of your queue of size X" Or " your queue of size Y is never used" We are kinda blind for a lot of our applications :-) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985
Re: [OMPI users] [EXTERNAL] Possible memory leak(s) in OpenMPI 1.6.3?
Dear Brian, thank you very much for your assistance and for the bug fix. Regards, Victor.