[Pvfs2-developers] Re: bmi multi-homed (was: Re:Initializing a BMI method)

Pete Wyckoff Tue, 26 Sep 2006 10:56:26 -0700

[EMAIL PROTECTED] wrote on Tue, 26 Sep 2006 07:20 -0500:
> Thanks for the explanation of why we don't handle the case of two,
> say, bmi-tcp interfaces well.  We actually came up with a reason we
> wanted to do that a while back, though. 
> 
> On BGL (yeah, all our wierd cases seem to start that way), the servers
> have two interfaces: one faces the BGL io nodes (172.whatever) and one
> faces the MCS internal network (140.whatever).  
> 
> We were trying to access the PVFS volume from jazz, chiba, and BGL,
> but when we did a getconfig, chiba would try to contact the server at
> 172.whatver and timeout.   
> 
> I think now the toplogy between the three clusters isn't very
> condusive to such a mounting scheme, but if we wanted to try again one
> day, is there a good way?  Would something like configuing the servers
> with a bmi-tcp and a bmi-tcp-alternate address work?


I'm not understanding why you can't just fix this with appropriate
routing at the IP layer.  If you have dual-homed hosts, you generally
pick the "public" side to be the official name then set up routes
from the private side so things can find the public IPs.

E.g. the local host names on one IO server machine with two ethernet
interfaces may be:

    140.221.x.1 bgl-pvfs-server1.mcs.anl.gov bgl-pvfs-server1
    172.16.x.1  bgl-pvfs-server1.private.mcs.anl.gov

Then your internal hosts have a pvfs2tab that uses the public name,
as do your external hosts.  Works fine for the external hosts.
To get the internal hosts to see the public addresses, you can
add routes on them:

    ip r a 140.221/16 via <my-default-gw>

And your gateway switch will either have a route up to the next
higher switch, or will arp for the public address and get a response
through the private interface of the server machine.  (We do all
this in production.)

Although perhaps you really don't want to do that, for some reason.
In which case there's a couple ways I could see to address this
inside PVFS, neither pretty.

1.  All servers listen on all addresses.  Have multiple Aliases for
each server.  Have the getconf server look at the incoming peer
address and consult a mapping to determine which server Aliases to
return.  Must manage your own routing, essentially, which is
normally IP's job.  Network-specific so would go inside bmi_tcp
rather than at a generic layer.

2.  Add the infrastructure to have two bmi_tcp instances, much like
we can have a bmi_ib + bmi_tcp today.  Each instance listens on its
own IP address.  The getconf server needs to know which set of
aliases correspond to its own IP listening address, but that's not
too difficult, just a bit ad-hoc (e.g., "take the first address").
Ugly because we would keep twice the amount of TCP state per server,
essentially:  two polling loops, two listening sockets, etc.

If anybody's hugely interested, neither of these would be too tough,
they just aren't very appealing solutions to me.  Fix your IP
configuration first if it's feasible.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] Re: bmi multi-homed (was: Re:Initializing a BMI method)

Reply via email to