I don't know if IB or MX need to use the bmi_method_addr_forget_callback() function. That function makes a little more sense in the context of a particular tcp problem:

- each time a client opens a new tcp socket to a server, the server creates a bmi_addr corresponding to that socket (so it can send responses, etc.). - if that client exits, then reconnects, the server just thinks of that as an entirely new bmi_addr; it doesn't have any way to realize that it is the same client connecting again using a different socket.

bmi_tcp therefore has to garbage collect old bmi_addr's when sockets close, otherwise the number of addresses can grow indefinitely for a long running tcp pvfs2-server (a problem Sam found a while back).

Ideally, when bmi_tcp figures out that a socket is closed, it would garbage collect immediately and get rid of the addr. However, the server could still have pending operations for that bmi_addr. So... we we mark that addr as in an error state and try to hang onto it until the reference count hits zero before garbage collecting. That let's us report a more meaningful error on the server side for pending operations than "addr doesn't exist".

The bmi_method_addr_foget_callback() in this case is a way to poke the upper level bmi code to say "keep an eye on this addr, and when the refcount hits zero clean it up for me".

I don't know if that description helps any, but that's my interpretation of what it does :) The address management (in general) in bmi has ended up being pretty wacky.

The DROP_ADDR function is how the bmi.c layer explicitly tells a bmi method to get rid of an address (if that action makes any sense for the method in question). So that part needs to really get rid of the address if necessary rather than handing it back to bmi.c with the bmi_method_addr_forget_callback().

-Phil

Scott Atchley wrote:
On Mar 4, 2008, at 6:58 PM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Tue, 04 Mar 2008 17:35 -0600:
It looks like the IB BMI layer is ending up double-freeing the method_addr
structure on the BMI_ib_set_info function, but it only happens when the
Metadata server is also a data server.

If you look at the following GDB output, the last two entries have the same method_addr, and I can't figure out a good way to tell in BMI_set_info if the method_address has already been freed. It also looks like the id_string
has been mangled or freed somewhere earlier as well.

All your deadref were different values there, so I'm not seeing the
double-free aspect.  But I have no doubt that you're on to something
in here.  Also, at this location, the id_string and method_addr have
already been freed, so we shouldn't count on them having reasonable
values in them.

I've always had a hard time keeping these references straight.  Can
you verify that you're getting to these spots via dealloc_ref_st(),
and maybe a couple steps up from there, for sanity?

Trying to figure out what other devices do in the DROP_ADDR handler.
MX goes and calls bmi_method_addr_forget_callback() in there, but
that doesn't seem right, as it will just wind around through
dealloc_ref_st() again.  It looks like TCP is doing more or less
what IB is doing.

Pete,

I do not see where bmi_ib uses bmi_method_addr_forget_callback() at all. I am looking at the tcp code and I do need to fix how/where I use the above.

Scott
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to