Re: thread suspension when dumping core
On Mon, Jun 13, 2016 at 12:24:52AM +0200, Jilles Tjoelker wrote: > On Thu, Jun 09, 2016 at 07:34:55AM +0300, Konstantin Belousov wrote: > > On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote: > > > On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > > > > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > > > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > > > > I also wonder whether we may be overengineering things here. > > > > > > > Perhaps > > > > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > > > > Well, this was the very first patch suggested. I would be > > > > > > fine with that, but again, out-of-tree code seems to be not > > > > > > quite fine with that local solution. > > > > > > In our particular case, we could possibly use a similar approach. In > > > > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > > > > sx_sleep() has any locks held. It is easy to verify that all callers > > > > > of > > > > > lf_advlock() are safe in this respect, but this kind of auditing is > > > > > generally hard. In fact, I believe the sx_sleep that led to the > > > > > problem > > > > > described in D2612 is the same as the one in my case. That is, the > > > > > sleeping thread may or may not hold a vnode lock depending on context. > > > > > I do not think that in-tree code sleeps with a vnode lock held in > > > > the lf_advlock(). Otherwise, system would hang in lock cascade by > > > > an attempt to obtain an advisory lock. I think we can even assert > > > > this with witness. > > > > > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > > > > called from vgone(). This sleep indeed occurs under the vnode lock, and > > > > as such must be non-suspendable. The sleep waits until other threads > > > > leave the lf_advlock() for the reclaimed vnode, and they should leave in > > > > deterministic time due to issued wakeups. So this sleep is exempt from > > > > the considerations, and TDF_SBDRY there is correct. > > > > > I am fine with either the braces around sx_sleep() in lf_advlock() to > > > > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > > > > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > > > > understanding is that you prefer the later. If I do not mis-represent > > > > your position, I understand why you do prefer that. > > > > The TDF_SRESTART change does fix some more problems such as umount -f > > > getting stuck in lf_purgelocks(). > > > > However, it introduces some subtle issues that may not necessarily be a > > > sufficient objection. > > > I did not see an objection in the notes below, but rather I read them > > as an argument to return EINTR from the stop attempts from lf_advlock(). > > With these changes, the question is which bugs you want to fix and which > bugs you want to leave unfixed or introduce. > > * The status quo sometimes deadlocks or almost deadlocks with thread > suspension. > > * Restarting the locking syscall after thread suspension using ERESTART > causes the suspended thread to be moved to the end of the queue and > the locked area to be re-evaluated which violates POSIX but probably > does not break applications. > > * Interrupting the locking syscall after thread suspension with EINTR > breaks applications with the reasonable assumption that [EINTR] can > only occur because of signals, whenever a locking attempt is > suspended. This requires a change to the man page and probably some > patches to ports. > > This could be avoided by allowing a thread to return to the user > boundary (but not beyond) while staying in the queue but that is a > fairly heavy API change. The thread would not be allowed to execute user > code (a signal handler) since it may take indefinitely long before the > thread would continue waiting for the lock, which matches exactly > POSIX's specification of F_SETLKW not restarting after signal handlers. What do you mean by this ? Are you proposing to leave our lockf_entry in the tree of the lockf graph, while allowing thread to proceed to user boundary, and making ast handler to re-enter the lf_advlock() ? At least, I cannot make any other sense of the text. But IMO that would not solve the problem. We could declare the sleep in lf_setlock() as stoppable (by clearing TDF_SBDRY around) and that gives the same effect without complications. But I do not think that this is acceptable, since our lockf_entry is in graph, and could block arbitrary other processes participating in the adv locking while holding kernel resources. Of course, user thread can be suspended on boundary while already fully owning a lock, and then we get a similar situation, at the outer look. But it is quite different from the view on internal kernel structures, which are in consistent state (as opposed
Re: thread suspension when dumping core
On Thu, Jun 09, 2016 at 07:34:55AM +0300, Konstantin Belousov wrote: > On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote: > > On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > > > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > > > I also wonder whether we may be overengineering things here. Perhaps > > > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > > > Well, this was the very first patch suggested. I would be > > > > > fine with that, but again, out-of-tree code seems to be not > > > > > quite fine with that local solution. > > > > In our particular case, we could possibly use a similar approach. In > > > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > > > sx_sleep() has any locks held. It is easy to verify that all callers of > > > > lf_advlock() are safe in this respect, but this kind of auditing is > > > > generally hard. In fact, I believe the sx_sleep that led to the problem > > > > described in D2612 is the same as the one in my case. That is, the > > > > sleeping thread may or may not hold a vnode lock depending on context. > > > I do not think that in-tree code sleeps with a vnode lock held in > > > the lf_advlock(). Otherwise, system would hang in lock cascade by > > > an attempt to obtain an advisory lock. I think we can even assert > > > this with witness. > > > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > > > called from vgone(). This sleep indeed occurs under the vnode lock, and > > > as such must be non-suspendable. The sleep waits until other threads > > > leave the lf_advlock() for the reclaimed vnode, and they should leave in > > > deterministic time due to issued wakeups. So this sleep is exempt from > > > the considerations, and TDF_SBDRY there is correct. > > > I am fine with either the braces around sx_sleep() in lf_advlock() to > > > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > > > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > > > understanding is that you prefer the later. If I do not mis-represent > > > your position, I understand why you do prefer that. > > The TDF_SRESTART change does fix some more problems such as umount -f > > getting stuck in lf_purgelocks(). > > However, it introduces some subtle issues that may not necessarily be a > > sufficient objection. > I did not see an objection in the notes below, but rather I read them > as an argument to return EINTR from the stop attempts from lf_advlock(). With these changes, the question is which bugs you want to fix and which bugs you want to leave unfixed or introduce. * The status quo sometimes deadlocks or almost deadlocks with thread suspension. * Restarting the locking syscall after thread suspension using ERESTART causes the suspended thread to be moved to the end of the queue and the locked area to be re-evaluated which violates POSIX but probably does not break applications. * Interrupting the locking syscall after thread suspension with EINTR breaks applications with the reasonable assumption that [EINTR] can only occur because of signals, whenever a locking attempt is suspended. This requires a change to the man page and probably some patches to ports. This could be avoided by allowing a thread to return to the user boundary (but not beyond) while staying in the queue but that is a fairly heavy API change. The thread would not be allowed to execute user code (a signal handler) since it may take indefinitely long before the thread would continue waiting for the lock, which matches exactly POSIX's specification of F_SETLKW not restarting after signal handlers. > > Firstly, adding this closes the door on fixing signal handling for > > fcntl(F_SETLKW). Per POSIX, any caught signal interrupts > > fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux > > man page documents the same. Our man page has documented that SA_RESTART > > behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This > > could normally be fixed via if (error == ERESTART) error = EINTR; but > > that is no longer possible if there are [ERESTART] errors that should > > still restart. > I added the translation to fcntl handler. > > Secondly, fcntl(F_SETLKW) restarting after a stop may actually be > > observable, contrary to what I wrote before. This is due to the fair > > queuing. Suppose thread A has locked byte 1 a while ago and thread B is > > trying to lock byte 1 and 2 right now. Then thread C will be able to > > lock byte 2 iff thread B has not blocked yet. If thread C will not be > > allowed to lock byte 2 and will block on it, the TDF_SRESTART change > > will cause it to be awakened if thread B is stopped. When thread B > > resumes, the region to be locked will be
Re: thread suspension when dumping core
On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote: > On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > > I also wonder whether we may be overengineering things here. Perhaps > > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > > Well, this was the very first patch suggested. I would be fine with > > > > that, > > > > but again, out-of-tree code seems to be not quite fine with that local > > > > solution. > > > > In our particular case, we could possibly use a similar approach. In > > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > > sx_sleep() has any locks held. It is easy to verify that all callers of > > > lf_advlock() are safe in this respect, but this kind of auditing is > > > generally hard. In fact, I believe the sx_sleep that led to the problem > > > described in D2612 is the same as the one in my case. That is, the > > > sleeping thread may or may not hold a vnode lock depending on context. > > > I do not think that in-tree code sleeps with a vnode lock held in > > the lf_advlock(). Otherwise, system would hang in lock cascade by > > an attempt to obtain an advisory lock. I think we can even assert > > this with witness. > > > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > > called from vgone(). This sleep indeed occurs under the vnode lock, and > > as such must be non-suspendable. The sleep waits until other threads > > leave the lf_advlock() for the reclaimed vnode, and they should leave in > > deterministic time due to issued wakeups. So this sleep is exempt from > > the considerations, and TDF_SBDRY there is correct. > > > I am fine with either the braces around sx_sleep() in lf_advlock() to > > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > > understanding is that you prefer the later. If I do not mis-represent > > your position, I understand why you do prefer that. > > The TDF_SRESTART change does fix some more problems such as umount -f > getting stuck in lf_purgelocks(). > > However, it introduces some subtle issues that may not necessarily be a > sufficient objection. I did not see an objection in the notes below, but rather I read them as an argument to return EINTR from the stop attempts from lf_advlock(). > > Firstly, adding this closes the door on fixing signal handling for > fcntl(F_SETLKW). Per POSIX, any caught signal interrupts > fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux > man page documents the same. Our man page has documented that SA_RESTART > behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This > could normally be fixed via if (error == ERESTART) error = EINTR; but > that is no longer possible if there are [ERESTART] errors that should > still restart. I added the translation to fcntl handler. > > Secondly, fcntl(F_SETLKW) restarting after a stop may actually be > observable, contrary to what I wrote before. This is due to the fair > queuing. Suppose thread A has locked byte 1 a while ago and thread B is > trying to lock byte 1 and 2 right now. Then thread C will be able to > lock byte 2 iff thread B has not blocked yet. If thread C will not be > allowed to lock byte 2 and will block on it, the TDF_SRESTART change > will cause it to be awakened if thread B is stopped. When thread B > resumes, the region to be locked will be recomputed. This scenario > unambiguously violates the POSIX requirement but I don't know how bad it > is. Indeed. > > Note that all these threads must be in separate processes because of > fcntl locks' strange semantics. So below is the next level of over-engineering the stuff. I made it unified on sigdeferstop() to avoid blowing up the KPI. I am not sure what modes are needed by onefs, so I added SIGDEFERSTOP_ERESTART despite it is not used by the kernel. lf_advlock() is not stoppable at all, with EINTR return. In the previous patches, only if the caller of lf_advlock() already set TDF_SBDRY, the function modified the behaviour. I considered adding td_sbdry member to struct thread for managing sbdry stops, but did not for now. If one more flag appear to be used, I will change that. diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c index 716faa3..fb3eb90 100644 --- a/sys/fs/fifofs/fifo_vnops.c +++ b/sys/fs/fifofs/fifo_vnops.c @@ -194,11 +194,10 @@ fifo_open(ap) if ((ap->a_mode & FREAD) && fip->fi_writers == 0) { gen = fip->fi_wgen; VOP_UNLOCK(vp, 0); - stops_deferred = sigallowstop(); + stops_deferred = sigdeferstop(SIGDEFERSTOP_OFF);
Re: thread suspension when dumping core
On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > I also wonder whether we may be overengineering things here. Perhaps > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > Well, this was the very first patch suggested. I would be fine with that, > > > but again, out-of-tree code seems to be not quite fine with that local > > > solution. > > In our particular case, we could possibly use a similar approach. In > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > sx_sleep() has any locks held. It is easy to verify that all callers of > > lf_advlock() are safe in this respect, but this kind of auditing is > > generally hard. In fact, I believe the sx_sleep that led to the problem > > described in D2612 is the same as the one in my case. That is, the > > sleeping thread may or may not hold a vnode lock depending on context. > I do not think that in-tree code sleeps with a vnode lock held in > the lf_advlock(). Otherwise, system would hang in lock cascade by > an attempt to obtain an advisory lock. I think we can even assert > this with witness. > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > called from vgone(). This sleep indeed occurs under the vnode lock, and > as such must be non-suspendable. The sleep waits until other threads > leave the lf_advlock() for the reclaimed vnode, and they should leave in > deterministic time due to issued wakeups. So this sleep is exempt from > the considerations, and TDF_SBDRY there is correct. > I am fine with either the braces around sx_sleep() in lf_advlock() to > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > understanding is that you prefer the later. If I do not mis-represent > your position, I understand why you do prefer that. The TDF_SRESTART change does fix some more problems such as umount -f getting stuck in lf_purgelocks(). However, it introduces some subtle issues that may not necessarily be a sufficient objection. Firstly, adding this closes the door on fixing signal handling for fcntl(F_SETLKW). Per POSIX, any caught signal interrupts fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux man page documents the same. Our man page has documented that SA_RESTART behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This could normally be fixed via if (error == ERESTART) error = EINTR; but that is no longer possible if there are [ERESTART] errors that should still restart. Secondly, fcntl(F_SETLKW) restarting after a stop may actually be observable, contrary to what I wrote before. This is due to the fair queuing. Suppose thread A has locked byte 1 a while ago and thread B is trying to lock byte 1 and 2 right now. Then thread C will be able to lock byte 2 iff thread B has not blocked yet. If thread C will not be allowed to lock byte 2 and will block on it, the TDF_SRESTART change will cause it to be awakened if thread B is stopped. When thread B resumes, the region to be locked will be recomputed. This scenario unambiguously violates the POSIX requirement but I don't know how bad it is. Note that all these threads must be in separate processes because of fcntl locks' strange semantics. -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > I also wonder whether we may be overengineering things here. Perhaps > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > Well, this was the very first patch suggested. I would be fine with that, > > > but again, out-of-tree code seems to be not quite fine with that local > > > solution. > > > > In our particular case, we could possibly use a similar approach. In > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > sx_sleep() has any locks held. It is easy to verify that all callers of > > lf_advlock() are safe in this respect, but this kind of auditing is > > generally hard. In fact, I believe the sx_sleep that led to the problem > > described in D2612 is the same as the one in my case. That is, the > > sleeping thread may or may not hold a vnode lock depending on context. > > I do not think that in-tree code sleeps with a vnode lock held in > the lf_advlock(). Otherwise, system would hang in lock cascade by > an attempt to obtain an advisory lock. I think we can even assert > this with witness. Indeed. I just meant that this appears to not be true of Isilon's FS locking code, which is parameterized heavily by the lock type. > > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > called from vgone(). This sleep indeed occurs under the vnode lock, and > as such must be non-suspendable. The sleep waits until other threads > leave the lf_advlock() for the reclaimed vnode, and they should leave in > deterministic time due to issued wakeups. So this sleep is exempt from > the considerations, and TDF_SBDRY there is correct. > > I am fine with either the braces around sx_sleep() in lf_advlock() to > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > understanding is that you prefer the later. If I do not mis-represent > your position, I understand why you do prefer that. Right, I think it would be hard for me to adopt a solution based on the proposed lf_advlock() change. So I prefer the latter, more general approach, additional complexity notwithstanding. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > I also wonder whether we may be overengineering things here. Perhaps > > > the advlock sleep can simply turn off TDF_SBDRY. > > Well, this was the very first patch suggested. I would be fine with that, > > but again, out-of-tree code seems to be not quite fine with that local > > solution. > > In our particular case, we could possibly use a similar approach. In > general, it seems incorrect to clear TDF_SBDRY if the thread calling > sx_sleep() has any locks held. It is easy to verify that all callers of > lf_advlock() are safe in this respect, but this kind of auditing is > generally hard. In fact, I believe the sx_sleep that led to the problem > described in D2612 is the same as the one in my case. That is, the > sleeping thread may or may not hold a vnode lock depending on context. I do not think that in-tree code sleeps with a vnode lock held in the lf_advlock(). Otherwise, system would hang in lock cascade by an attempt to obtain an advisory lock. I think we can even assert this with witness. There is another sleep, which Jilles mentioned, in lf_purgelocks(), called from vgone(). This sleep indeed occurs under the vnode lock, and as such must be non-suspendable. The sleep waits until other threads leave the lf_advlock() for the reclaimed vnode, and they should leave in deterministic time due to issued wakeups. So this sleep is exempt from the considerations, and TDF_SBDRY there is correct. I am fine with either the braces around sx_sleep() in lf_advlock() to clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, which adds temporal override for TDF_SBDRY with TDF_SRESTART. My understanding is that you prefer the later. If I do not mis-represent your position, I understand why you do prefer that. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > In this case it is clear which sleep(9) calls should be affected so it > > may be better to avoid more hidden state. > In this case yes, but apparently some out-of-tree users exist. And, > the marking of the single sx_sleep() call depends on knowing our > implementation. I remember that as the arguments to change from PBDRY to > the current state setter, in NFS it is not too hard to try to enumerate > interruptible sleeps. > > > > > I also wonder whether we may be overengineering things here. Perhaps > > the advlock sleep can simply turn off TDF_SBDRY. > Well, this was the very first patch suggested. I would be fine with that, > but again, out-of-tree code seems to be not quite fine with that local > solution. In our particular case, we could possibly use a similar approach. In general, it seems incorrect to clear TDF_SBDRY if the thread calling sx_sleep() has any locks held. It is easy to verify that all callers of lf_advlock() are safe in this respect, but this kind of auditing is generally hard. In fact, I believe the sx_sleep that led to the problem described in D2612 is the same as the one in my case. That is, the sleeping thread may or may not hold a vnode lock depending on context. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote: > > On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote: > > > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote: > > > > This looks as if we should not ignore suspension requests in > > > > thread_suspend_check() completely in TDF_SBDRY case, but return either > > > > EINTR or ERESTART (most likely ERESTART). Note that the goal of > > > > TDF_SBDRY is to avoid suspending in the protected region, not to make an > > > > impression that the suspension does not occur at all. > > > > This looks like it would revert r246417 and re-introduce the bug fixed > > > by it (unexpected [EINTR] and short reads/writes after stop signals). > > > Well, the patch returns ERESTART and not EINTR, so the syscall should > > be retried after all the unwinding. > > That fixes the [EINTR] part of the problem but not short reads and > writes. Which are IMO fine, but I understand that there is too many programs which are not aware. > > > > After r246417, TDF_SBDRY is intended for sleeps that occur while holding > > > resources such as vnode locks and are normally short but should be > > > interruptible by fatal signals because they may occasionally be > > > indefinitely long (such as a non-responsive NFS server). > > > > It looks like yet another kind of sleep may be required, since advisory > > > locks still hold some filesystem resources across the sleep (though not > > > vnode locks). > > > I do not think that adv locks enter sleep with any resource held which > > would block other threads. But I agree with the statement because the > > lock might be granted and then the stopped thread would appear to own > > the blocking resource. > > It does not hold any resource used by normal operations, but it blocks a > forced unmount (umount -f hangs in [purgelocks] with tmpfs in a recent > stable/10). What I mean, is that when the lock is granted during sleep, other threads might (or might not) block on the lock edge coming from the suspended thread. Even if they do not block, it is too fine implementation detail. > > If queuing is supposed to be fair, then granting the lock to the stopped > thread is correct and aborting the sleep with [ERESTART] would break it. > The kern_lockf.c code seems to go to some lengths to make queuing fair. > This does not seem very important, though. > > Also, restarting a locking call violates some text in POSIX XSH's fcntl > page that the range of bytes to be locked shall be determined before the > thread blocks (this may be affected by the current seek offset and the > file size). I don't know whether violating this will break any > applications. The text has the problem that there is no way to > distinguish between a thread that is in fcntl() and has not yet blocked > and a thread that has blocked, even though it seems intuitively clear. Such race is self-inflicted IMO. The app assumes that it can call blockable fcntl() and then change file offset, in other thread. How would other thread determine that the blockable thread is blocked ? > > > > We then have four kinds: > > > > * uninterruptible by signals, ignores stops (default) > > > * interruptible by signals, ignores stops (current TDF_SBDRY with > > > PCATCH) > > > * interruptible by signals, freezes in place on stops (avoids > > > unexpected short I/O) (current PCATCH, otherwise) > > > * interruptible by signals, fails with [ERESTART] on stops (avoids > > > holding resources across a stop) (new) > > > > The new kind of sleep would fail with [ERESTART] only for stops, since > > > [EINTR] should only be returned if a signal handler was called. There > > > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a > > > handler does not stop the process. > > > And where would this new kind of sleep used ? The advlock sleep is the one > > place. Does fifo sleep for reader or writer on open require this kind > > of handling (IMO no) ? > > > I think this can be relatively easily implemented with either a flag > > for XXXsleep(9) (my older style of PBDRY) or using only the thread flag > > (jhb' newer TDF_SBDRY approach). Probably the later should be used, for > > consistency and easier marking of larger blocks of code. > > In this case it is clear which sleep(9) calls should be affected so it > may be better to avoid more hidden state. In this case yes, but apparently some out-of-tree users exist. And, the marking of the single sx_sleep() call depends on knowing our implementation. I remember that as the arguments to change from PBDRY to the current state setter, in NFS it is not too hard to try to enumerate interruptible sleeps. > > I also wonder whether we may be overengineering things here. Perhaps > the advlock sleep can simply turn off TDF_SBDRY. Well, this was the very first patch suggested. I would be fine with that, but again, out-of-tree co
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote: > On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote: > > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote: > > > This looks as if we should not ignore suspension requests in > > > thread_suspend_check() completely in TDF_SBDRY case, but return either > > > EINTR or ERESTART (most likely ERESTART). Note that the goal of > > > TDF_SBDRY is to avoid suspending in the protected region, not to make an > > > impression that the suspension does not occur at all. > > This looks like it would revert r246417 and re-introduce the bug fixed > > by it (unexpected [EINTR] and short reads/writes after stop signals). > Well, the patch returns ERESTART and not EINTR, so the syscall should > be retried after all the unwinding. That fixes the [EINTR] part of the problem but not short reads and writes. > > After r246417, TDF_SBDRY is intended for sleeps that occur while holding > > resources such as vnode locks and are normally short but should be > > interruptible by fatal signals because they may occasionally be > > indefinitely long (such as a non-responsive NFS server). > > It looks like yet another kind of sleep may be required, since advisory > > locks still hold some filesystem resources across the sleep (though not > > vnode locks). > I do not think that adv locks enter sleep with any resource held which > would block other threads. But I agree with the statement because the > lock might be granted and then the stopped thread would appear to own > the blocking resource. It does not hold any resource used by normal operations, but it blocks a forced unmount (umount -f hangs in [purgelocks] with tmpfs in a recent stable/10). If queuing is supposed to be fair, then granting the lock to the stopped thread is correct and aborting the sleep with [ERESTART] would break it. The kern_lockf.c code seems to go to some lengths to make queuing fair. This does not seem very important, though. Also, restarting a locking call violates some text in POSIX XSH's fcntl page that the range of bytes to be locked shall be determined before the thread blocks (this may be affected by the current seek offset and the file size). I don't know whether violating this will break any applications. The text has the problem that there is no way to distinguish between a thread that is in fcntl() and has not yet blocked and a thread that has blocked, even though it seems intuitively clear. > > We then have four kinds: > > * uninterruptible by signals, ignores stops (default) > > * interruptible by signals, ignores stops (current TDF_SBDRY with > > PCATCH) > > * interruptible by signals, freezes in place on stops (avoids > > unexpected short I/O) (current PCATCH, otherwise) > > * interruptible by signals, fails with [ERESTART] on stops (avoids > > holding resources across a stop) (new) > > The new kind of sleep would fail with [ERESTART] only for stops, since > > [EINTR] should only be returned if a signal handler was called. There > > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a > > handler does not stop the process. > And where would this new kind of sleep used ? The advlock sleep is the one > place. Does fifo sleep for reader or writer on open require this kind > of handling (IMO no) ? > I think this can be relatively easily implemented with either a flag > for XXXsleep(9) (my older style of PBDRY) or using only the thread flag > (jhb' newer TDF_SBDRY approach). Probably the later should be used, for > consistency and easier marking of larger blocks of code. In this case it is clear which sleep(9) calls should be affected so it may be better to avoid more hidden state. I also wonder whether we may be overengineering things here. Perhaps the advlock sleep can simply turn off TDF_SBDRY. -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote: > On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote: > > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote: > > > This looks as if we should not ignore suspension requests in > > > thread_suspend_check() completely in TDF_SBDRY case, but return either > > > EINTR or ERESTART (most likely ERESTART). Note that the goal of > > > TDF_SBDRY is to avoid suspending in the protected region, not to make an > > > impression that the suspension does not occur at all. > > > > This looks like it would revert r246417 and re-introduce the bug fixed > > by it (unexpected [EINTR] and short reads/writes after stop signals). > Well, the patch returns ERESTART and not EINTR, so the syscall should > be retried after all the unwinding. > > > > > After r246417, TDF_SBDRY is intended for sleeps that occur while holding > > resources such as vnode locks and are normally short but should be > > interruptible by fatal signals because they may occasionally be > > indefinitely long (such as a non-responsive NFS server). > > > > It looks like yet another kind of sleep may be required, since advisory > > locks still hold some filesystem resources across the sleep (though not > > vnode locks). > I do not think that adv locks enter sleep with any resource held which > would block other threads. But I agree with the statement because the > lock might be granted and then the stopped thread would appear to own > the blocking resource. > > > > > We then have four kinds: > > > > * uninterruptible by signals, ignores stops (default) > > * interruptible by signals, ignores stops (current TDF_SBDRY with > > PCATCH) > > * interruptible by signals, freezes in place on stops (avoids > > unexpected short I/O) (current PCATCH, otherwise) > > * interruptible by signals, fails with [ERESTART] on stops (avoids > > holding resources across a stop) (new) > > > > The new kind of sleep would fail with [ERESTART] only for stops, since > > [EINTR] should only be returned if a signal handler was called. There > > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a > > handler does not stop the process. > > > And where would this new kind of sleep used ? The advlock sleep is the one > place. Does fifo sleep for reader or writer on open require this kind > of handling (IMO no) ? > > I think this can be relatively easily implemented with either a flag > for XXXsleep(9) (my older style of PBDRY) or using only the thread flag > (jhb' newer TDF_SBDRY approach). Probably the later should be used, for > consistency and easier marking of larger blocks of code. Like this. diff --git a/sys/kern/kern_lockf.c b/sys/kern/kern_lockf.c index a0a3789..ee26596 100644 --- a/sys/kern/kern_lockf.c +++ b/sys/kern/kern_lockf.c @@ -1378,7 +1378,7 @@ lf_setlock(struct lockf *state, struct lockf_entry *lock, struct vnode *vp, void **cookiep) { static char lockstr[] = "lockf"; - int priority, error; + int error, priority, stoprestart; #ifdef LOCKF_DEBUG if (lockf_debug & 1) @@ -1466,7 +1466,10 @@ lf_setlock(struct lockf *state, struct lockf_entry *lock, struct vnode *vp, } lock->lf_refs++; + stoprestart = sigstoprestart(); error = sx_sleep(lock, &state->ls_lock, priority, lockstr, 0); + if (stoprestart) + sigstopnormal(); if (lf_free_lock(lock)) { error = EDOOFUS; goto out; diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index 75a1259..1d7036d 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c @@ -2633,6 +2633,35 @@ sigallowstop(void) return (prev); } +int +sigstoprestart(void) +{ + struct thread *td; + + td = curthread; + if ((td->td_flags & TDF_SBDRY) == 0 || + (td->td_flags & TDF_SRESTART) != 0) + return (0); + thread_lock(td); + td->td_flags |= TDF_SRESTART; + thread_unlock(td); + return (1); +} + +int +sigstopnormal(void) +{ + struct thread *td; + int prev; + + td = curthread; + thread_lock(td); + prev = (td->td_flags & TDF_SRESTART) != 0; + td->td_flags &= ~TDF_SRESTART; + thread_unlock(td); + return (prev); +} + /* * If the current process has received a signal (should be caught or cause * termination, should interrupt current syscall), return the signal number. diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c index 9af377e..6460ae9 100644 --- a/sys/kern/kern_thread.c +++ b/sys/kern/kern_thread.c @@ -932,7 +932,8 @@ thread_suspend_check(int return_instead) if ((td->td_flags & TDF_SBDRY) != 0) { KASSERT(return_instead, ("TDF_SBDRY set for unsafe thread_suspend_check")); - return (0); +
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote: > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote: > > This looks as if we should not ignore suspension requests in > > thread_suspend_check() completely in TDF_SBDRY case, but return either > > EINTR or ERESTART (most likely ERESTART). Note that the goal of > > TDF_SBDRY is to avoid suspending in the protected region, not to make an > > impression that the suspension does not occur at all. > > This looks like it would revert r246417 and re-introduce the bug fixed > by it (unexpected [EINTR] and short reads/writes after stop signals). Well, the patch returns ERESTART and not EINTR, so the syscall should be retried after all the unwinding. > > After r246417, TDF_SBDRY is intended for sleeps that occur while holding > resources such as vnode locks and are normally short but should be > interruptible by fatal signals because they may occasionally be > indefinitely long (such as a non-responsive NFS server). > > It looks like yet another kind of sleep may be required, since advisory > locks still hold some filesystem resources across the sleep (though not > vnode locks). I do not think that adv locks enter sleep with any resource held which would block other threads. But I agree with the statement because the lock might be granted and then the stopped thread would appear to own the blocking resource. > > We then have four kinds: > > * uninterruptible by signals, ignores stops (default) > * interruptible by signals, ignores stops (current TDF_SBDRY with > PCATCH) > * interruptible by signals, freezes in place on stops (avoids > unexpected short I/O) (current PCATCH, otherwise) > * interruptible by signals, fails with [ERESTART] on stops (avoids > holding resources across a stop) (new) > > The new kind of sleep would fail with [ERESTART] only for stops, since > [EINTR] should only be returned if a signal handler was called. There > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a > handler does not stop the process. > And where would this new kind of sleep used ? The advlock sleep is the one place. Does fifo sleep for reader or writer on open require this kind of handling (IMO no) ? I think this can be relatively easily implemented with either a flag for XXXsleep(9) (my older style of PBDRY) or using only the thread flag (jhb' newer TDF_SBDRY approach). Probably the later should be used, for consistency and easier marking of larger blocks of code. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote: > On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote: > > Sure, see below. For reference: > > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | > > NEEDSUSPCHK > > td_pflags = 0 > > td_inhibitors = 0x2 = SLEEPING > > td_locks = 0 > > stack: > > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d > > ... > > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS > > p_flag2 = 0 > > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the > > SLEEPABORT flag is not, so the thread can not have been sleeping when > > thread_single() was called - else sleepq_abort() would have been > > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in > > sleepq_catch_signals(), and no signal was pending, so we called > > thread_suspend_check(), which returned 0 because of SBDRY. So we went to > > sleep. > > I note that this couldn't have happened prior to r283320. That change > > was apparently motivated by a similar hang, but in that case the thread > > was suspended (with a vnode lock held) rather than asleep. It looks like > > our internal fix also added a change to set TDF_SBDRY around > > filesystem-specific syscalls, which often sleep interruptibly while > > holding vnode locks. But I don't think that's the problem here, as you > > noted with lf_advlock(). > > With r283320 reverted, P_STOPPED_SIG would not have been set, so > > thread_suspend_check() would have suspended us, allowing the core dump > > to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping > > would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so > > this is probably the wrong place to be solving the problem. > This looks as if we should not ignore suspension requests in > thread_suspend_check() completely in TDF_SBDRY case, but return either > EINTR or ERESTART (most likely ERESTART). Note that the goal of > TDF_SBDRY is to avoid suspending in the protected region, not to make an > impression that the suspension does not occur at all. This looks like it would revert r246417 and re-introduce the bug fixed by it (unexpected [EINTR] and short reads/writes after stop signals). After r246417, TDF_SBDRY is intended for sleeps that occur while holding resources such as vnode locks and are normally short but should be interruptible by fatal signals because they may occasionally be indefinitely long (such as a non-responsive NFS server). It looks like yet another kind of sleep may be required, since advisory locks still hold some filesystem resources across the sleep (though not vnode locks). We then have four kinds: * uninterruptible by signals, ignores stops (default) * interruptible by signals, ignores stops (current TDF_SBDRY with PCATCH) * interruptible by signals, freezes in place on stops (avoids unexpected short I/O) (current PCATCH, otherwise) * interruptible by signals, fails with [ERESTART] on stops (avoids holding resources across a stop) (new) The new kind of sleep would fail with [ERESTART] only for stops, since [EINTR] should only be returned if a signal handler was called. There cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a handler does not stop the process. -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote: > On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote: > > Sure, see below. For reference: > > > > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | > > NEEDSUSPCHK > > td_pflags = 0 > > td_inhibitors = 0x2 = SLEEPING > > td_locks = 0 > > > > stack: > > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d > > ... > > > > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS > > p_flag2 = 0 > > > > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the > > SLEEPABORT flag is not, so the thread can not have been sleeping when > > thread_single() was called - else sleepq_abort() would have been > > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in > > sleepq_catch_signals(), and no signal was pending, so we called > > thread_suspend_check(), which returned 0 because of SBDRY. So we went to > > sleep. > > This looks as if we should not ignore suspension requests in > thread_suspend_check() completely in TDF_SBDRY case, but return either > EINTR or ERESTART (most likely ERESTART). Note that the goal of > TDF_SBDRY is to avoid suspending in the protected region, not to make an > impression that the suspension does not occur at all. Thanks, that sounds right to me. It results in unified behaviour for TDF_SBDRY regardless of whether the suspension attempt took place before or after the thread went to sleep, and seems like it does the right thing in the single-threaded case as well. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote: > Sure, see below. For reference: > > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK > td_pflags = 0 > td_inhibitors = 0x2 = SLEEPING > td_locks = 0 > > stack: > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d > ... > > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS > p_flag2 = 0 > > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the > SLEEPABORT flag is not, so the thread can not have been sleeping when > thread_single() was called - else sleepq_abort() would have been > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in > sleepq_catch_signals(), and no signal was pending, so we called > thread_suspend_check(), which returned 0 because of SBDRY. So we went to > sleep. > > I note that this couldn't have happened prior to r283320. That change > was apparently motivated by a similar hang, but in that case the thread > was suspended (with a vnode lock held) rather than asleep. It looks like > our internal fix also added a change to set TDF_SBDRY around > filesystem-specific syscalls, which often sleep interruptibly while > holding vnode locks. But I don't think that's the problem here, as you > noted with lf_advlock(). > > With r283320 reverted, P_STOPPED_SIG would not have been set, so > thread_suspend_check() would have suspended us, allowing the core dump > to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping > would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so > this is probably the wrong place to be solving the problem. This looks as if we should not ignore suspension requests in thread_suspend_check() completely in TDF_SBDRY case, but return either EINTR or ERESTART (most likely ERESTART). Note that the goal of TDF_SBDRY is to avoid suspending in the protected region, not to make an impression that the suspension does not occur at all. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Tue, Jun 07, 2016 at 05:46:10AM +0300, Konstantin Belousov wrote: > On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote: > > On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote: > What statement was not true: that your code sets TDF_SBDRY, or that > the lf_advlock() function was called ? The latter. It turns out that we've modified some of our filesystem-specific syscalls to set TDF_SBDRY as well; see below. > > I'm a bit confused by this. How does TDF_SBDRY prevent thread_single() > > from waking up the thread? The sleepq_abort() call is only elided in the > > SINGLE_ALLPROC case, so in other cases, I think we will still interrupt > > the sleep. Thus, since thread_suspend_check() is only invoked prior to > > going to sleep, the application I referred to must have attempted to > > single-thread the process before the thread in question went to sleep. > It does not prevent the wakeup, sorry. > > What I should have said, more precisely, is that thread_suspend_check() > call before the thread is goes to sleep, is nop in case of TDF_SBDRY > flag was set. Ok, that's my understanding too. > > > From what I see in the code, our NFS client has similar issue of calling > > > lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. > > > Similar bug existed in our fifofs, see r277321. > > > > Thanks. It may be that a similar fix is appropriate in our locking code, > > but I'll have to spend more time reading it. > > Still, I am confused now as well. If you can catch the process in that > state, where a thread is sleeping while single-threading request cannot > make the progress, I would like to see the struct thread and struct proc > printouts. Esp. the thread flags are interesting. Sure, see below. For reference: td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK td_pflags = 0 td_inhibitors = 0x2 = SLEEPING td_locks = 0 stack: mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ... p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS p_flag2 = 0 The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the SLEEPABORT flag is not, so the thread can not have been sleeping when thread_single() was called - else sleepq_abort() would have been invoked and set SLEEPABORT. We are at the second sleepq_switch() call in sleepq_catch_signals(), and no signal was pending, so we called thread_suspend_check(), which returned 0 because of SBDRY. So we went to sleep. I note that this couldn't have happened prior to r283320. That change was apparently motivated by a similar hang, but in that case the thread was suspended (with a vnode lock held) rather than asleep. It looks like our internal fix also added a change to set TDF_SBDRY around filesystem-specific syscalls, which often sleep interruptibly while holding vnode locks. But I don't think that's the problem here, as you noted with lf_advlock(). With r283320 reverted, P_STOPPED_SIG would not have been set, so thread_suspend_check() would have suspended us, allowing the core dump to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so this is probably the wrong place to be solving the problem. CPU IDFUNCTION:NAME 1 1 :BEGIN struct thread { struct mtx *volatile td_lock = 0x814a5148 struct proc *td_proc = 0xf80445c694d8 struct td_plist = { struct thread *tqe_next = 0xf81a44186750 struct thread **tqe_prev = 0xf8047a027760 } struct td_runq = { struct thread *tqe_next = 0 struct thread **tqe_prev = 0x8147cd40 } struct td_slpq = { struct thread *tqe_next = 0 struct thread **tqe_prev = 0xf81c992b1d80 } struct td_lockq = { struct thread *tqe_next = 0 struct thread **tqe_prev = 0xfe2a1c727be8 } struct td_hash = { struct thread *le_next = 0 struct thread **le_prev = 0xfe0001076f98 } struct cpuset *td_cpuset = 0xf80699b9b510 struct seltd *td_sel = 0xf80258307b80 struct sleepqueue *td_sleepqueue = 0 struct turnstile *td_turnstile = 0xf80d50a3fcc0 struct rl_q_entry *td_rlqe = 0xf8025dcf5988 struct umtx_q *td_umtxq = 0xf807c9adfd80 lwpid_t td_tid = 0x18df3 sigqueue_t td_sigqueue = { sigset_t sq_signals = { __uint32_t [4] __bits = [ 0, 0, 0, 0 ] } sigset_t sq_kill = { __uint32_t [4] __bits = [ 0, 0, 0, 0 ] } struct sq_list = { struct ksiginfo *tqh_first = 0 struct ksiginfo **tqh_last = 0xf814be2df808 } struct proc *sq_proc = 0xf80445c694d8 int sq_flags = 0x1 } u_char td_lend_user_pri = 0xff int td_flags = 0xa84c int td_inhibitors = 0x2 int td_pflags = 0 int td_dupfd = 0
Re: thread suspension when dumping core
On Mon, Jun 6, 2016 at 7:46 PM, Konstantin Belousov wrote: > On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote: >> On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote: >> > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? >> >> It doesn't. This code belongs to a general framework for distributed FS >> locks; in this particular case, the application was using it to acquire >> a custom advisory lock. > What statement was not true: that your code sets TDF_SBDRY, or that > the lf_advlock() function was called ? We set TDF_SBDRY; we don't call lf_advlock(). I don't have the core yet, so I can't get the thread information, sorry. Best, Conrad ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote: > On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote: > > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? > > It doesn't. This code belongs to a general framework for distributed FS > locks; in this particular case, the application was using it to acquire > a custom advisory lock. What statement was not true: that your code sets TDF_SBDRY, or that the lf_advlock() function was called ? > > > This cannot work, regardless of the mode of single-threading. TDF_SBDRY > > makes such sleep non-interruptible by any single-threading request, on > > the promise that the thread owns some resources (typically vnode locks). > > I.e. changing the mode would not help. > > I'm a bit confused by this. How does TDF_SBDRY prevent thread_single() > from waking up the thread? The sleepq_abort() call is only elided in the > SINGLE_ALLPROC case, so in other cases, I think we will still interrupt > the sleep. Thus, since thread_suspend_check() is only invoked prior to > going to sleep, the application I referred to must have attempted to > single-thread the process before the thread in question went to sleep. It does not prevent the wakeup, sorry. What I should have said, more precisely, is that thread_suspend_check() call before the thread is goes to sleep, is nop in case of TDF_SBDRY flag was set. > > > > > I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows > > coredump writer to record more exact state of the process into the notes. > > > > Another one is that SINGLE_NO_EXIT is generally faster and more > > reliable than SINGLE_BOUNDARY. Some states are already good enough for > > SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In > > other words, core dump write starts earlier. > > > > It might be not very significant reasons. > > > > From what I see in the code, our NFS client has similar issue of calling > > lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. > > Similar bug existed in our fifofs, see r277321. > > Thanks. It may be that a similar fix is appropriate in our locking code, > but I'll have to spend more time reading it. Still, I am confused now as well. If you can catch the process in that state, where a thread is sleeping while single-threading request cannot make the progress, I would like to see the struct thread and struct proc printouts. Esp. the thread flags are interesting. Thanks. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote: > On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote: > > Hi, > > > > I've recently observed a hang in a multi-threaded process that had hit > > an assertion failure and was attempting to dump core. One thread was > > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our > > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to > > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails > > to interrupt the sleeping thread, resulting in the hang. > > > > My question is, why does the SA_CORE handler not force all threads to > > the user boundary before attempting to dump core? It must do so later > > anyway in order to exit. As I understand it, TDF_SBDRY is intended to > > avoid deadlocks that can occur when stopping a process, but in this > > case we don't stop the process with the intention of resuming it, so it > > seems erroneous to apply this flag. > > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? It doesn't. This code belongs to a general framework for distributed FS locks; in this particular case, the application was using it to acquire a custom advisory lock. > This cannot work, regardless of the mode of single-threading. TDF_SBDRY > makes such sleep non-interruptible by any single-threading request, on > the promise that the thread owns some resources (typically vnode locks). > I.e. changing the mode would not help. I'm a bit confused by this. How does TDF_SBDRY prevent thread_single() from waking up the thread? The sleepq_abort() call is only elided in the SINGLE_ALLPROC case, so in other cases, I think we will still interrupt the sleep. Thus, since thread_suspend_check() is only invoked prior to going to sleep, the application I referred to must have attempted to single-thread the process before the thread in question went to sleep. > > I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows > coredump writer to record more exact state of the process into the notes. > > Another one is that SINGLE_NO_EXIT is generally faster and more > reliable than SINGLE_BOUNDARY. Some states are already good enough for > SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In > other words, core dump write starts earlier. > > It might be not very significant reasons. > > From what I see in the code, our NFS client has similar issue of calling > lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. > Similar bug existed in our fifofs, see r277321. Thanks. It may be that a similar fix is appropriate in our locking code, but I'll have to spend more time reading it. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: thread suspension when dumping core
On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote: > Hi, > > I've recently observed a hang in a multi-threaded process that had hit > an assertion failure and was attempting to dump core. One thread was > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails > to interrupt the sleeping thread, resulting in the hang. > > My question is, why does the SA_CORE handler not force all threads to > the user boundary before attempting to dump core? It must do so later > anyway in order to exit. As I understand it, TDF_SBDRY is intended to > avoid deadlocks that can occur when stopping a process, but in this > case we don't stop the process with the intention of resuming it, so it > seems erroneous to apply this flag. Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? This cannot work, regardless of the mode of single-threading. TDF_SBDRY makes such sleep non-interruptible by any single-threading request, on the promise that the thread owns some resources (typically vnode locks). I.e. changing the mode would not help. I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows coredump writer to record more exact state of the process into the notes. Another one is that SINGLE_NO_EXIT is generally faster and more reliable than SINGLE_BOUNDARY. Some states are already good enough for SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In other words, core dump write starts earlier. It might be not very significant reasons. >From what I see in the code, our NFS client has similar issue of calling lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. Similar bug existed in our fifofs, see r277321. diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c index 2a8afa9..98625ee 100644 --- a/sys/fs/nfsclient/nfs_clvnops.c +++ b/sys/fs/nfsclient/nfs_clvnops.c @@ -2992,7 +2992,7 @@ nfs_advlock(struct vop_advlock_args *ap) struct proc *p = (struct proc *)ap->a_id; struct thread *td = curthread; /* XXX */ struct vattr va; - int ret, error = EOPNOTSUPP; + int sbdry, ret, error = EOPNOTSUPP; u_quad_t size; if (NFS_ISV4(vp) && (ap->a_flags & (F_POSIX | F_FLOCK)) != 0) { @@ -3087,7 +3087,10 @@ nfs_advlock(struct vop_advlock_args *ap) if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) { size = VTONFS(vp)->n_size; NFSVOPUNLOCK(vp, 0); + sbdry = sigallowstop(); error = lf_advlock(ap, &(vp->v_lockf), size); + if (sbdry) + sigdeferstop(); } else { if (nfs_advlock_p != NULL) error = nfs_advlock_p(ap); @@ -3114,7 +3117,7 @@ nfs_advlockasync(struct vop_advlockasync_args *ap) { struct vnode *vp = ap->a_vp; u_quad_t size; - int error; + int error, sbdry; if (NFS_ISV4(vp)) return (EOPNOTSUPP); @@ -3124,7 +3127,10 @@ nfs_advlockasync(struct vop_advlockasync_args *ap) if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) { size = VTONFS(vp)->n_size; NFSVOPUNLOCK(vp, 0); + sbdry = sigallowstop(); error = lf_advlockasync(ap, &(vp->v_lockf), size); + if (sbdry) + sigdeferstop(); } else { NFSVOPUNLOCK(vp, 0); error = EOPNOTSUPP; ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
thread suspension when dumping core
Hi, I've recently observed a hang in a multi-threaded process that had hit an assertion failure and was attempting to dump core. One thread was sleeping interruptibly on an advisory lock with TDF_SBDRY set (our filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to suspend other threads with thread_single(SINGLE_NO_EXIT), which fails to interrupt the sleeping thread, resulting in the hang. My question is, why does the SA_CORE handler not force all threads to the user boundary before attempting to dump core? It must do so later anyway in order to exit. As I understand it, TDF_SBDRY is intended to avoid deadlocks that can occur when stopping a process, but in this case we don't stop the process with the intention of resuming it, so it seems erroneous to apply this flag. Thanks, -Mark ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"