FWIW, we do something similar in the openib BTL -- we use the subnet
ID to determine if two IB ports are connected (we have the rule in
OMPI that physically disconnected subnets must have different ID's --
this is more stringent than the IB spec calls for). See:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
http://www.open-mpi.org/faq/?category=openfabrics#ofa-which-subnet-id
On Jun 1, 2007, at 12:46 PM, Reese Faucette wrote:
Just to brainstorm on this a little - the two different clusters
will have
different "mapper IDs", and this can be learned via the attached code
snippet. As long as fma is the mapper (as opposed the the older,
deprecated
"gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure
that NIC 0,
port 0 is all you need to examine. All nodes with the same mapper
can then
be considered "on the same fabric"
Except, of course, when you have two fabrics A and B with many
nodes each
but only one node in common - then, all will have the same mapper
ID, but
are effectively two disjoint fabrics. This is rare, but i have
seen it
once.
Perhaps a more general solution is for the MX MTL to look in the MX
peer
table for a requested peer (or simply try mx_connect() and notice it
fails?) and report "cannot reach" back up the chain and have higher
level
code retry with a different medium on a per-peer basis? This would be
independent of IB or MX or ...
===================================
#include <stdio.h>
#include <stdlib.h>
#include "myriexpress.h"
#include "mx_io.h"
main()
{
mx_return_t ret;
mx_endpt_handle_t h;
mx_mapper_state_t ms;
int board = 0; /* whichever board you want */
mx_init();
ret = mx_open_board(board, &h);
if (ret != MX_SUCCESS) {
fprintf(stderr, "Unable to open board %d\n", board);
exit(1);
}
ms.board_number = board;
ms.iport = 0;
ret = mx__get_mapper_state(h, &ms);
if (ret != MX_SUCCESS) {
fprintf(stderr, "get_mapper_state failed for board %d: %s\n",
board, mx_strerror(ret));
exit(1);
}
printf("mapper = %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x\n",
ms.mapper_mac[0] & 0xff, ms.mapper_mac[1] & 0xff,
ms.mapper_mac[2] & 0xff, ms.mapper_mac[3] & 0xff,
ms.mapper_mac[4] & 0xff, ms.mapper_mac[5] & 0xff);
exit(0);
}
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems