[EMAIL PROTECTED] wrote on Mon, 08 Oct 2007 11:13 -0500:
> The attached patch is the proposed fix for this problem. When the
> tcp method receives a disconnect from a peer, it invokes a callback
> (bmi_method_addr_forget_callback) into the bmi control layer to
> remove the address reference from the list. Maybe I should also add
> a counter and limit on how bit the list can get, al though that would
> involve potentially forcing long-lived connections to reconnect
> periodically, and all methods would have to implement BMI_set_info
> (DROP_ADDR).
>
> With tcp, new connections are registered, even if they are from the
> same host/port on the peer, whereas the other methods seem to only
> register new host/port endpoints that haven't been seen before. So
> its not completely clear to me when the other methods need to call
> this callback, if at all. There needs to be a matching
> bmi_method_addr_forget_callback for each
> bmi_method_addr_reg_callback, but if the method only registers a
> single address per client, the list won't keep growing, unless we
> ever plan to support millions of clients.
>
> With gm, the address is registered with the control layer, and
> managed internally as well (gm_addr_add). The address is never
> removed from the internal list though (gm_addr_del is never called).
> Again, only new host/port pairs that haven't been seen are added to
> the list, and registered with the control layer. gm doesn't
> implement the BMI_set_info(DROP_ADDR) call, so addresses are not
> expunged even if requested explicitly. My guess is we should
> probably add a gm_addr_del for DROP_ADDR?
>
> With ib, it looks like the server receives new connections and
> registers them with the control layer, but the connections never get
> closed, or the ib layer doesn't handle them? The only place I can
> find where connections are dropped is if an explicit BMI_set_info
> (DROP_ADDR), which doesn't get called from the server.
>
> With mx, it looks like there's a limit on the number of connections
> from a peer (BMX_PEER_RX_NUM == 20). As new connections are received
> the idle connections are closed? Should
> bmi_method_addr_forget_callback be called from there?
I read your patch. And read this mail twice, no thrice. I'm
totally confused now.
Tell me which of these things are true/false, and perhaps add more
so we Scott and I can understand.
1. Address types
- core PVFS refers to peers using a 64-bit opaque BMI_addr_t
- BMI has a list that maps BMI_addr_to to a method (e.g. mx or tcp)
and a struct method_addr *.
- BMI methods do not see BMI_addr_t. They see struct method_addr *.
That struct has a void * that is filled with method-specific
items, like a connection structure.
2. Address creation
- apps ask for peers by name: tcp://myhost:1223/pvfs
- these names are parsed by the appropriate method to return
an existing or new struct method_addr *. Methods are expected
to keep a list of known peers? IB does.
- servers can also receive new connections, in which case they
must build a struct method_addr, and register it with core
BMI, through bmi_method_addr_reg_callback()
3. Address destruction
- currently never happens. In theory, core BMI can request
that a method drop an address, perhaps when out of resources.
- BMI_addr_t and struct method_addr and method-internal state are
kept synchronized how? Through which calls? Who frees which
things when?
I'm obviously wrong and confused about #3. And maybe 2b makes
a bad assumption.
Then how does your new forget callback fit in with all this?
Somehow I thought the endless list growth was purely a TCP problem,
not something that needed core plumbing.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers