On Tue, Apr 24, 2018 at 01:24:30PM -0400, Jonathan T. Looney wrote: > On Mon, Apr 23, 2018 at 6:04 PM, John Baldwin <j...@freebsd.org> wrote: > > > > I think this is actually a key question. In my experience to date I have > not > > encountered a large number of post-panic assertion failures. Given that > > we already break all locks and disable assertions for locks I'd be curious > > which assertions are actually failing. My inclination given my > experiences > > to date would be to explicitly ignore those as we do for locking if it is > > constrained set rather than blacklisting all of them. However, I would be > > most interested in seeing some examples of assertions that are failing. > > The latest example (the one that prompted me to finally commit this) is in > lockmgr_sunlock_try(): 'panic: Assertion (*xp & ~LK_EXCLUSIVE_SPINNERS) == > LK_SHARERS_LOCK(1) failed at /usr/src/sys/kern/kern_lock.c:541' > > I don't see any obvious recent changes that would have caused this, so this > is probably a case where a change to another file suddenly made us trip > over this assert. > > And, that really illustrates my overall point:
Mine too. :) Why is anything trying to acquire a lockmgr lock after a panic? What is the stack? I suspect that CAM is completing non-dump CCBs after a panic, which can cause deadlocks if the completion handler needs to perform a TLB shootdown after destroying a mapping, for example. In fact, I had forgotten that Isilon has some CAM patches which attempt to address this because of the problems that such deadlocks had caused. I will work on getting these reviewed and upstreamed. > most assertions in > general-use code have limited value after a panic. > > We expect developers to write high-quality assertions so we can catch bugs. > This requires that they understand how their code will be used. However, > once we've panic'd, many assumptions behind code change and the assertions > are no longer valid. (And, sometimes, it is difficult for a developer to > predict how these things will change in a panic situation.) We can either > play whack-a-mole to modify assertions as we trip over them in our > post-panic work, or we can switch to an opt-in model where we only check > assertions which the developer actually intends to run post-panic. > > Playing whack-a-mole seems like a Sisyphean task which will burn out > developers and/or frustrate people who run INVARIANTS kernels. Switching to > an opt-in model seems like the better long-term strategy. > > Having said all of that, I am cognizant of at least two things: > 1) Mark Johnston has done a lot of work in coredumps and thinks there are > post-panic assertions that have value. > 2) Until we have both agreement to switch our post-panic assertion paradigm > and also infrastructure to allow developers to opt in, it probably is not > wise to disable all assertions by default. > > So, I will follow Mark's suggestions: I will change the default. I will > also change the code so we print a limited number of failed assertions. Thanks. > However, I think that changing the post-panic assertion paradigm is an > important conversation to have. We want people to run our INVARIANTS > kernels. And, we want to get high-quality reports from those. I think we > could better serve those goals by changing the post-panic assertion > paradigm. > > Jonathan _______________________________________________ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"