I don't know if IB or MX need to use the
bmi_method_addr_forget_callback() function. That function makes a
little more sense in the context of a particular tcp problem:
- each time a client opens a new tcp socket to a server, the server
creates a bmi_addr corresponding to that socket (so it can send
responses, etc.).
- if that client exits, then reconnects, the server just thinks of that
as an entirely new bmi_addr; it doesn't have any way to realize that it
is the same client connecting again using a different socket.
bmi_tcp therefore has to garbage collect old bmi_addr's when sockets
close, otherwise the number of addresses can grow indefinitely for a
long running tcp pvfs2-server (a problem Sam found a while back).
Ideally, when bmi_tcp figures out that a socket is closed, it would
garbage collect immediately and get rid of the addr. However, the
server could still have pending operations for that bmi_addr. So... we
we mark that addr as in an error state and try to hang onto it until the
reference count hits zero before garbage collecting. That let's us
report a more meaningful error on the server side for pending operations
than "addr doesn't exist".
The bmi_method_addr_foget_callback() in this case is a way to poke the
upper level bmi code to say "keep an eye on this addr, and when the
refcount hits zero clean it up for me".
I don't know if that description helps any, but that's my interpretation
of what it does :) The address management (in general) in bmi has ended
up being pretty wacky.
The DROP_ADDR function is how the bmi.c layer explicitly tells a bmi
method to get rid of an address (if that action makes any sense for the
method in question). So that part needs to really get rid of the
address if necessary rather than handing it back to bmi.c with the
bmi_method_addr_forget_callback().
-Phil
Scott Atchley wrote:
On Mar 4, 2008, at 6:58 PM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Tue, 04 Mar 2008 17:35 -0600:
It looks like the IB BMI layer is ending up double-freeing the
method_addr
structure on the BMI_ib_set_info function, but it only happens when the
Metadata server is also a data server.
If you look at the following GDB output, the last two entries have
the same
method_addr, and I can't figure out a good way to tell in
BMI_set_info if
the method_address has already been freed. It also looks like the
id_string
has been mangled or freed somewhere earlier as well.
All your deadref were different values there, so I'm not seeing the
double-free aspect. But I have no doubt that you're on to something
in here. Also, at this location, the id_string and method_addr have
already been freed, so we shouldn't count on them having reasonable
values in them.
I've always had a hard time keeping these references straight. Can
you verify that you're getting to these spots via dealloc_ref_st(),
and maybe a couple steps up from there, for sanity?
Trying to figure out what other devices do in the DROP_ADDR handler.
MX goes and calls bmi_method_addr_forget_callback() in there, but
that doesn't seem right, as it will just wind around through
dealloc_ref_st() again. It looks like TCP is doing more or less
what IB is doing.
Pete,
I do not see where bmi_ib uses bmi_method_addr_forget_callback() at all.
I am looking at the tcp code and I do need to fix how/where I use the
above.
Scott
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers