Date: Tue, 29 Jul 2008 14:19:14 -0400
From: "Alexander Shabarshin" <ashabars...@developonbox.com>
Subject: Re: [OMPI users] Communitcation between OpenMPI and
ClusterTools
To: <us...@open-mpi.org>
Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=response
Hello
>>> > One idea comes to mind is whether the two nodes are on the same
>>> > subnet? If they are not on the same subnet I think there is a bug in
>>> > which the TCP BTL will recuse itself from communications between the
>>> > two nodes.
>> you are right - subnets are different, but routes set up correctly and
>> everything like ping, ssh etc. are working OK between them
> But it isn't a routing problem but how the tcp btl in Open MPI decides
> which interface the nodes can communicate with (completely out of the
> hands of the TCP stack and lower).
Do you know when it can be fixed in official OpenMPI?
Is patch available or something?
Well this problem is captured in ticket 972
(https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question as
to whether this ticket has been fixed or not (that is was code actually
putback). Sun's experience with the Trunk, 1.3 branch and CT8 EA2
release seems to be that you now can run jobs across subnets but we
(Sun) are not completely
FWIW, it looks like that code has had a lot of changes in it between 1.2
and 1.3.
--td