On Fri, 9 Aug 2013, chas williams - CONTRACTOR wrote:

On Thu, 8 Aug 2013 19:54:26 -0400 (EDT)
Benjamin Kaduk <ka...@mit.edu> wrote:

For the restart cases ('restart -all' and 'restart -bosserver') the user
(or bosserver itself) should probably be prevented from creating new
bnodes since the bnode list shouldn't grow while the various shutdown
procs are being applied.

That would be easy enough to fix with a flag/lock.

Well, I think your tempBnodes_lock is performing this function, or nearly
this function.  I am not convinced that the extra tempBnodes queue is
necessary, though; I think we can just have a flag and skip the extra
queue and lock.

You need it.  You shouldn't hold the bnode list lock while running
the proc passed into bnode_ApplyInstance() the proc's might need to
lock the bnode list in order to change the reference counts on the
bnodes.

You definitely can't hold the bnode list lock while running the procs in ApplyInstance, yes. I'm not immediately convinced that dropping the bnode list lock implies that a temp list is necessary; I think an additional flag or write lock similar to your next paragraph may be sufficient. Maybe, I should just try to implement my idea and we will see if it works or not.

However, fixing Jeffery's issue with restarts is pretty easy though.  We
just need another lock (essentially a modification/write lock) to block
changes (add and deletes but not reference count changes) to the list of
current bnodes while restart's are happening. Things like
SBOZO_RestartAll() shold hold this lock to ensure that the multiple uses
of bnode_ApplyInstance() apply to the same list of bnodes. Any racing
bnode_CreateBnode() would eventually succeed (except in the cases were
the bosserver is shutting down completely).

Interesting, that's not quite as I would have expected.  The reference
count is serving only to keep the bnode from being deleted, I thought, not
as a write barrier.  So I would not have thought the locking would add a
substantial overhead.  And yes, "totally bogus" was a bit overblown, it
was mostly okay but have a few places that were quite wrong.

It does but you need to take the bnode list lock in order to change the
reference count on the bnodes.  I considered a read/write lock strategy

Okay, yes.

that would let you just have a single bnode list and avoid the temp
queues.  But that seemed overly complicated for bosserver.  Now
considering restartAll perhaps that would be a better choice.

This is more akin to my thinking, yes.

I'll keep looking at the windows stuff (maybe I just need to #include
<winsock2.h> and it will just work?) and per-bnode locks a bit more, but
at the moment, my leaning is that the easiest way forward would be to take
your code and replace the tempBnodes queue with the global flag mentioned
above.  Would you be okay if I went and did that?  This would leave us
with a tbozo, and we could get rid of the LWP copy after a bit of time.

I have no way to test windows.  I can only compile test. You can try to

I only have a 1.7 build environment, not a master one. Maybe it would just be a matter of installing the Secure Endpoints Kerberos development thingy that is required, I don't know.

eliminate the temp list of bnodes if you like but I think there are
easier ways to solve this problem.  Yes, it does look complicated but
it really isn't -- It is a temporary list of nodes.

I think the reason I am having such a hard time accepting the temp list is that the word I use in my head to think about it is not "complicated" but rather "ugly". :)
We'll see if that still holds in a week...

-Ben
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to