Re: [OMPI users] mixing MX and TCP
Well, as expected this call is not documented ... and I get to it only with some help from Loic. On the version of MX I have tested it is not required to add the 3th and 4th arguments as I don't want to set anything. The NIC is already specified through the mx_btl->mx_endpoint isn't it ? george. On Mon, 11 Jun 2007, Reese Faucette wrote: ! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, !_id, sizeof(nic_id), , sizeof(int))) != MX_SUCCESS ) { yes, a NIC ID is required for this call because a host may have multiple NICs with different linespeeds, e.g. a 2G card and a 10G card. -reese ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users "We must accept finite disappointment, but we must never lose infinite hope." Martin Luther King
Re: [OMPI users] mixing MX and TCP
It's about using multiple network interfaces to exchange messages between a pair of hosts. The networks can be identical or not. george. On Jun 9, 2007, at 8:19 PM, Alex Tumanov wrote: forgive a trivial question, but what's a multi-rail? On 6/8/07, George Bosilcawrote: A fix for this problem is now available on the trunk. Please use any revision after 14963 and your problem will vanish [I hope!]. There are now some additional parameters which allow you to select which Myrinet network you want to use in the case there are several available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should now work over MX. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] mixing MX and TCP
George Bosilca wrote: A fix for this problem is now available on the trunk. Please use any revision after 14963 and your problem will vanish [I hope!]. There are now some additional parameters which allow you to select which Myrinet network you want to use in the case there are several available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should now work over MX. I have tried nightly snapshot openmpi-1.3a1r14981 and it (almost) seems to work. The version as is, when run in combination with MX-1.2.0j and the FMA mapper, currently results in the following error on each node: mx_get_info(MX_LINE_SPEED) failed with status 35 (Bad info length) However, with the small patch below, multi-cluster jobs indeed seem to be running fine (using MX locally). I'll do some more testing later this week. Thanks a lot for the fix! Kees *** ./ompi/mca/btl/mx/btl_mx_component.c.orig 2007-06-11 17:12:11.0 +0200 --- ./ompi/mca/btl/mx/btl_mx_component.c2007-06-11 17:13:34.0 +0200 *** *** 310,316 #if defined(MX_HAS_NET_TYPE) { int value; ! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, NULL, 0, , sizeof(int))) != MX_SUCCESS ) { opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d (%s)\n", status, mx_strerror(status) ); --- 310,317 #if defined(MX_HAS_NET_TYPE) { int value; ! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, !_id, sizeof(nic_id), , sizeof(int))) != MX_SUCCESS ) { opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d (%s)\n", status, mx_strerror(status) );
Re: [OMPI users] mixing MX and TCP
A fix for this problem is now available on the trunk. Please use any revision after 14963 and your problem will vanish [I hope!]. There are now some additional parameters which allow you to select which Myrinet network you want to use in the case there are several available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should now work over MX. george. On May 31, 2007, at 12:09 PM, Kees Verstoep wrote: Hi, I am currently experimenting with OpenMPI in a multi-cluster setting where each cluster has its private Myri-10G/MX network besides TCP. Somehow I was under the assumption that OpenMPI would dynamically find out the details of this configuration, and use MX where possible (i.e., intra cluster), and TCP elsewhere. But from some initial testing it appears OpenMPI-1.2.1 assumes global connectivity over MX when every particpating host supports MX. I see MX rather than tcp-level connections between clusters being tried, which across clusters fails in mx_connect/mx_isend (at the moment there is no inter-cluster support in MX itself). Besides "mx", I do include "tcp" in the network option lists of course. Is this just something that is not yet supported in the current release, or does it work by providing some extra parameters? I have not started digging in the code yet. Thanks! Kees Verstoep ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] mixing MX and TCP
Just to brainstorm on this a little - the two different clusters will have different "mapper IDs", and this can be learned via the attached code snippet. As long as fma is the mapper (as opposed the the older, deprecated "gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure that NIC 0, port 0 is all you need to examine. All nodes with the same mapper can then be considered "on the same fabric" Except, of course, when you have two fabrics A and B with many nodes each but only one node in common - then, all will have the same mapper ID, but are effectively two disjoint fabrics. This is rare, but i have seen it once. Perhaps a more general solution is for the MX MTL to look in the MX peer table for a requested peer (or simply try mx_connect() and notice it fails?) and report "cannot reach" back up the chain and have higher level code retry with a different medium on a per-peer basis? This would be independent of IB or MX or ... === #include #include #include "myriexpress.h" #include "mx_io.h" main() { mx_return_t ret; mx_endpt_handle_t h; mx_mapper_state_t ms; int board = 0;/* whichever board you want */ mx_init(); ret = mx_open_board(board, ); if (ret != MX_SUCCESS) { fprintf(stderr, "Unable to open board %d\n", board); exit(1); } ms.board_number = board; ms.iport = 0; ret = mx__get_mapper_state(h, ); if (ret != MX_SUCCESS) { fprintf(stderr, "get_mapper_state failed for board %d: %s\n", board, mx_strerror(ret)); exit(1); } printf("mapper = %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x\n", ms.mapper_mac[0] & 0xff, ms.mapper_mac[1] & 0xff, ms.mapper_mac[2] & 0xff, ms.mapper_mac[3] & 0xff, ms.mapper_mac[4] & 0xff, ms.mapper_mac[5] & 0xff); exit(0); }
Re: [OMPI users] mixing MX and TCP
Well, we are aware of this problem, but to be honest I was ready to bet that nobody will use a cluster of cluster with Myrinet and TCP ... so it was in a low priority TODO list. The problem is the routing table of the MX device. The MX BTL is unable to identify in a unique manner that there are not one but multiple Myrinet networks. As long as a node report a MX handle back at the end of MPI_Init, everybody else will try to use it if they need to setup a MX connection. Of course this will fail for multiple Myrinet networks. However, THIS it is not supposed to stop your MPI application. Open MPI will deselect the MX BTL for this particular connection and switch to TCP (if available). george. On Jun 1, 2007, at 4:25 AM, Christian Kauhaus wrote: Kees Verstoep: I am currently experimenting with OpenMPI in a multi-cluster setting where each cluster has its private Myri-10G/MX network besides TCP. Very interesting topic. :) I see MX rather than tcp-level connections between clusters being tried, which across clusters fails in mx_connect/mx_isend (at the moment there is no inter-cluster support in MX itself). Besides "mx", I do include "tcp" in the network option lists of course. It seems that the BTL does not realize that the two Myrinets are not connected. We are currently working on getting the handling of all cases with different TCP/IP networks right (public IPv4, private IPv4, IPv6), but to my knowledge nobody has done a detailed evaluation on Open MPI in multi-domain clusters with mixed networks (TCP+MX, TCP+IB, ...) yet. -Christian -- Dipl.-Inf. Christian Kauhaus <>< Lehrstuhl fuer Rechnerarchitektur und -kommunikation Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena Tel: +49 3641 9 46376 * Fax: +49 3641 9 46372 * Raum 3217 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] mixing MX and TCP
Kees Verstoep: >I am currently experimenting with OpenMPI in a multi-cluster setting >where each cluster has its private Myri-10G/MX network besides TCP. Very interesting topic. :) >I see MX rather than tcp-level connections between clusters being >tried, which across clusters fails in mx_connect/mx_isend (at the >moment there is no inter-cluster support in MX itself). Besides "mx", >I do include "tcp" in the network option lists of course. It seems that the BTL does not realize that the two Myrinets are not connected. We are currently working on getting the handling of all cases with different TCP/IP networks right (public IPv4, private IPv4, IPv6), but to my knowledge nobody has done a detailed evaluation on Open MPI in multi-domain clusters with mixed networks (TCP+MX, TCP+IB, ...) yet. -Christian -- Dipl.-Inf. Christian Kauhaus <>< Lehrstuhl fuer Rechnerarchitektur und -kommunikation Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena Tel: +49 3641 9 46376 * Fax: +49 3641 9 46372 * Raum 3217 pgpAdyIqR9NPs.pgp Description: PGP signature
[OMPI users] mixing MX and TCP
Hi, I am currently experimenting with OpenMPI in a multi-cluster setting where each cluster has its private Myri-10G/MX network besides TCP. Somehow I was under the assumption that OpenMPI would dynamically find out the details of this configuration, and use MX where possible (i.e., intra cluster), and TCP elsewhere. But from some initial testing it appears OpenMPI-1.2.1 assumes global connectivity over MX when every particpating host supports MX. I see MX rather than tcp-level connections between clusters being tried, which across clusters fails in mx_connect/mx_isend (at the moment there is no inter-cluster support in MX itself). Besides "mx", I do include "tcp" in the network option lists of course. Is this just something that is not yet supported in the current release, or does it work by providing some extra parameters? I have not started digging in the code yet. Thanks! Kees Verstoep