FWIW, we do something similar in the openib BTL -- we use the subnet ID to determine if two IB ports are connected (we have the rule in OMPI that physically disconnected subnets must have different ID's -- this is more stringent than the IB spec calls for). See:

http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
http://www.open-mpi.org/faq/?category=openfabrics#ofa-which-subnet-id


On Jun 1, 2007, at 12:46 PM, Reese Faucette wrote:

Just to brainstorm on this a little - the two different clusters will have
different "mapper IDs", and this can be learned via the attached code
snippet. As long as fma is the mapper (as opposed the the older, deprecated "gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure that NIC 0, port 0 is all you need to examine. All nodes with the same mapper can then
be considered "on the same fabric"

Except, of course, when you have two fabrics A and B with many nodes each but only one node in common - then, all will have the same mapper ID, but are effectively two disjoint fabrics. This is rare, but i have seen it
once.

Perhaps a more general solution is for the MX MTL to look in the MX peer
table for a requested peer  (or simply try mx_connect() and notice it
fails?) and report "cannot reach" back up the chain and have higher level
code retry with a different medium on a per-peer basis?  This would be
independent of IB or MX or ...

===================================
#include <stdio.h>
#include <stdlib.h>
#include "myriexpress.h"
#include "mx_io.h"

main()
{
  mx_return_t ret;
  mx_endpt_handle_t h;
  mx_mapper_state_t ms;
  int board = 0;                /* whichever board you want */

  mx_init();
  ret = mx_open_board(board, &h);
  if (ret != MX_SUCCESS) {
    fprintf(stderr, "Unable to open board %d\n", board);
    exit(1);
  }

  ms.board_number = board;
  ms.iport = 0;
  ret = mx__get_mapper_state(h, &ms);
  if (ret != MX_SUCCESS) {
    fprintf(stderr, "get_mapper_state failed for board %d: %s\n",
        board, mx_strerror(ret));
    exit(1);
  }

  printf("mapper = %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x\n",
         ms.mapper_mac[0] & 0xff, ms.mapper_mac[1] & 0xff,
         ms.mapper_mac[2] & 0xff, ms.mapper_mac[3] & 0xff,
         ms.mapper_mac[4] & 0xff, ms.mapper_mac[5] & 0xff);
  exit(0);
}


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to