On Thu, Aug 8, 2013 at 4:30 PM, Andrew Deason <adea...@sinenomine.net>wrote:
> On Thu, 8 Aug 2013 15:40:08 -0400 (EDT) > Benjamin Kaduk <ka...@mit.edu> wrote: > > > However, the bosserver is currently using LWP for parallelism, and > > GSSAPI libraries which are compatible with LWP are hard to come by; > > the obvious solution is to convert the bosserver to pthreads. > > Just mentioning... you can have a pthread process with lwp emulation > that implements all of the lwp primitives in terms of pthreads. This is > like having just a giant anti-preemption lock around everything, and I > thought this already existed somewhere. I don't think that helps with > signalling, but it can help with locking issues. > > Code to do this is in Arla; I donated it a jillion years ago, and it's probably not drop-in for OpenAFS at this point. > Not that that's a good general pthread-ifying solution, but since > bosserver doesn't need to run very fast, consistency seems more > important than actual parallelism. > > > First off: do we need to keep an LWP version of the bosserver around > > as well as a pthreaded one? I don't think so, and I believe Simon > > agrees, but it would be good to get consensus. > > It does not need to stay very long; lwp fileserver has already been > removed. If you're asking if you can get rid of the LWP bosserver at the > same time as introducing a pthreaded bosserver, I think that depends on > how sure you are that it functions correctly. I would vote for a tbozo > directory, but if the changes are not complex and you're verify > confident, it may not be necessary. But I think it's easier to implement > a tbozo, and then remove bozo (and move tbozo into it) when it's just as > good. > > > Second, how strong of an integrity guarantee do we need for the bos > > config? My understanding is that configuration changes (adding or > > removing or en/disabling bnodes) are rare events, and it is highly > > unlikely that multiple administrator connnections changing things will > > be made concurrently. > > We can assume they are infrequent, but we must assume that they will > happen. That is, there needs to be locking, but it doesn't need to be > very granular. That is, it can be slow, but it cannot cause something to > break or behave weirdly. > > > If this is true, then we can rely on time-domain "locking" for > > synchronization and eliminate some aspects of code-level locking. For > > example, a per-bnode lock acquired before writing any bnode state > > would not be needed, and a single global lock would be sufficient. > > I don't really see how one of these is offering integrity but the other > isn't, but... A single lock is fine, if I understand this correctly. > You've never been able to do certain bozo things in parallel, but I > haven't heard complaining about it. In any case, rxgk is more important > than improving that. > > > Relatedly, is it okay to assume that shutdown/restart/etc. will not be > > issued concurrently with config changes? A "fully correct" > > implementation would seem to need to only shutdown/restart the bnodes > > which were configured when the command was issued, and ignore any new > > nodes created since then. Because the implementation of > > shutdown/restart must drop locks, making this guarantee seems to > > require additional sychronization effort, whether via a temporary > > queue to store the bnodes being acted upon, or a higher-level lock. > > Are you talking about a 'bos create' racing with a 'bos restart -all'? I > would think you'd block out all modifications during a restart. While > the ordering may not matter for 'bos restart -all', it may matter for > 'bos restart -bosserver', just so it doesn't leave behind a running > process and then re-exec itself or something. > > > I haven't been able to convince myself that the additional complexity > > of the extra watcher threads is necessary, but if someone else could > > convince me, that would be good. > > My opinion is that we should explicitly drop LINUX24 support on servers > (or at least tbozo, if we eventually provide both tbozo and bozo). I > have never heard of demand for LINUX24 servers, and it's easy to migrate > off of them. The thing I have heard demand for and is not easy to > migrate off of is LINUX24 clients, which we could still keep. > > I mean, regardless of what solution we end up with, how much testing is > anyone really going to do for bozo on LINUX24? We're just going to end > up with something that theoretically works but we're not very confident > has solved various possible race conditions or whatnot. If we want to > keep LINUX24 for this, we should at least put a big warning on it that > mentions something involving the relevant issues. > > That would be my theory. At this point a lot of stuff is inadequately tested on 2.4. > That doesn't deal with any signalling specifics, but keep in mind our > current bozo signal handling is not always great, and does not > necessarily need to be fixed at the same time. I've always seen bozo > misidentify core dumps, which I thought was due to this, but I've never > really cared. > > -- > Andrew Deason > adea...@sinenomine.net > > _______________________________________________ > OpenAFS-devel mailing list > OpenAFS-devel@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-devel > > -- Derrick