Re: thread suspension when dumping core

2016-06-12 Thread Konstantin Belousov
On Mon, Jun 13, 2016 at 12:24:52AM +0200, Jilles Tjoelker wrote:
> On Thu, Jun 09, 2016 at 07:34:55AM +0300, Konstantin Belousov wrote:
> > On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote:
> > > On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> > > > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > > > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > > > > I also wonder whether we may be overengineering things here. 
> > > > > > > Perhaps
> > > > > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > > > > Well, this was the very first patch suggested.  I would be
> > > > > > fine with that, but again, out-of-tree code seems to be not
> > > > > > quite fine with that local solution.
> 
> > > > > In our particular case, we could possibly use a similar approach. In
> > > > > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > > > > sx_sleep() has any locks held. It is easy to verify that all callers 
> > > > > of
> > > > > lf_advlock() are safe in this respect, but this kind of auditing is
> > > > > generally hard. In fact, I believe the sx_sleep that led to the 
> > > > > problem
> > > > > described in D2612 is the same as the one in my case. That is, the
> > > > > sleeping thread may or may not hold a vnode lock depending on context.
> 
> > > > I do not think that in-tree code sleeps with a vnode lock held in
> > > > the lf_advlock().  Otherwise, system would hang in lock cascade by
> > > > an attempt to obtain an advisory lock.  I think we can even assert
> > > > this with witness.
> 
> > > > There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> > > > called from vgone(). This sleep indeed occurs under the vnode lock, and
> > > > as such must be non-suspendable. The sleep waits until other threads
> > > > leave the lf_advlock() for the reclaimed vnode, and they should leave in
> > > > deterministic time due to issued wakeups.  So this sleep is exempt from
> > > > the considerations, and TDF_SBDRY there is correct.
> 
> > > > I am fine with either the braces around sx_sleep() in lf_advlock() to
> > > > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> > > > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> > > > understanding is that you prefer the later. If I do not mis-represent
> > > > your position, I understand why you do prefer that.
> 
> > > The TDF_SRESTART change does fix some more problems such as umount -f
> > > getting stuck in lf_purgelocks().
> 
> > > However, it introduces some subtle issues that may not necessarily be a
> > > sufficient objection.
> 
> > I did not see an objection in the notes below, but rather I read them
> > as an argument to return EINTR from the stop attempts from lf_advlock().
> 
> With these changes, the question is which bugs you want to fix and which
> bugs you want to leave unfixed or introduce.
> 
> * The status quo sometimes deadlocks or almost deadlocks with thread
>   suspension.
> 
> * Restarting the locking syscall after thread suspension using ERESTART
>   causes the suspended thread to be moved to the end of the queue and
>   the locked area to be re-evaluated which violates POSIX but probably
>   does not break applications.
> 
> * Interrupting the locking syscall after thread suspension with EINTR
>   breaks applications with the reasonable assumption that [EINTR] can
>   only occur because of signals, whenever a locking attempt is
>   suspended. This requires a change to the man page and probably some
>   patches to ports.
> 
> This could be avoided by allowing a thread to return to the user
> boundary (but not beyond) while staying in the queue but that is a
> fairly heavy API change. The thread would not be allowed to execute user
> code (a signal handler) since it may take indefinitely long before the
> thread would continue waiting for the lock, which matches exactly
> POSIX's specification of F_SETLKW not restarting after signal handlers.
What do you mean by this ?  Are you proposing to leave our lockf_entry
in the tree of the lockf graph, while allowing thread to proceed to
user boundary, and making ast handler to re-enter the lf_advlock() ?
At least, I cannot make any other sense of the text.

But IMO that would not solve the problem.  We could declare the sleep
in lf_setlock() as stoppable (by clearing TDF_SBDRY around) and that
gives the same effect without complications.  But I do not think that
this is acceptable, since our lockf_entry is in graph, and could block
arbitrary other processes participating in the adv locking while holding
kernel resources.

Of course, user thread can be suspended on boundary while already fully
owning a lock, and then we get a similar situation, at the outer look.
But it is quite different from the view on internal kernel structures,
which are in consistent state (as opposed 

Re: thread suspension when dumping core

2016-06-12 Thread Jilles Tjoelker
On Thu, Jun 09, 2016 at 07:34:55AM +0300, Konstantin Belousov wrote:
> On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote:
> > On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> > > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > > > I also wonder whether we may be overengineering things here. Perhaps
> > > > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > > > Well, this was the very first patch suggested.  I would be
> > > > > fine with that, but again, out-of-tree code seems to be not
> > > > > quite fine with that local solution.

> > > > In our particular case, we could possibly use a similar approach. In
> > > > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > > > sx_sleep() has any locks held. It is easy to verify that all callers of
> > > > lf_advlock() are safe in this respect, but this kind of auditing is
> > > > generally hard. In fact, I believe the sx_sleep that led to the problem
> > > > described in D2612 is the same as the one in my case. That is, the
> > > > sleeping thread may or may not hold a vnode lock depending on context.

> > > I do not think that in-tree code sleeps with a vnode lock held in
> > > the lf_advlock().  Otherwise, system would hang in lock cascade by
> > > an attempt to obtain an advisory lock.  I think we can even assert
> > > this with witness.

> > > There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> > > called from vgone(). This sleep indeed occurs under the vnode lock, and
> > > as such must be non-suspendable. The sleep waits until other threads
> > > leave the lf_advlock() for the reclaimed vnode, and they should leave in
> > > deterministic time due to issued wakeups.  So this sleep is exempt from
> > > the considerations, and TDF_SBDRY there is correct.

> > > I am fine with either the braces around sx_sleep() in lf_advlock() to
> > > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> > > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> > > understanding is that you prefer the later. If I do not mis-represent
> > > your position, I understand why you do prefer that.

> > The TDF_SRESTART change does fix some more problems such as umount -f
> > getting stuck in lf_purgelocks().

> > However, it introduces some subtle issues that may not necessarily be a
> > sufficient objection.

> I did not see an objection in the notes below, but rather I read them
> as an argument to return EINTR from the stop attempts from lf_advlock().

With these changes, the question is which bugs you want to fix and which
bugs you want to leave unfixed or introduce.

* The status quo sometimes deadlocks or almost deadlocks with thread
  suspension.

* Restarting the locking syscall after thread suspension using ERESTART
  causes the suspended thread to be moved to the end of the queue and
  the locked area to be re-evaluated which violates POSIX but probably
  does not break applications.

* Interrupting the locking syscall after thread suspension with EINTR
  breaks applications with the reasonable assumption that [EINTR] can
  only occur because of signals, whenever a locking attempt is
  suspended. This requires a change to the man page and probably some
  patches to ports.

This could be avoided by allowing a thread to return to the user
boundary (but not beyond) while staying in the queue but that is a
fairly heavy API change. The thread would not be allowed to execute user
code (a signal handler) since it may take indefinitely long before the
thread would continue waiting for the lock, which matches exactly
POSIX's specification of F_SETLKW not restarting after signal handlers.

> > Firstly, adding this closes the door on fixing signal handling for
> > fcntl(F_SETLKW). Per POSIX, any caught signal interrupts
> > fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux
> > man page documents the same. Our man page has documented that SA_RESTART
> > behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This
> > could normally be fixed via  if (error == ERESTART) error = EINTR;  but
> > that is no longer possible if there are [ERESTART] errors that should
> > still restart.

> I added the translation to fcntl handler.

> > Secondly, fcntl(F_SETLKW) restarting after a stop may actually be
> > observable, contrary to what I wrote before. This is due to the fair
> > queuing. Suppose thread A has locked byte 1 a while ago and thread B is
> > trying to lock byte 1 and 2 right now. Then thread C will be able to
> > lock byte 2 iff thread B has not blocked yet. If thread C will not be
> > allowed to lock byte 2 and will block on it, the TDF_SRESTART change
> > will cause it to be awakened if thread B is stopped. When thread B
> > resumes, the region to be locked will be 

Re: thread suspension when dumping core

2016-06-08 Thread Konstantin Belousov
On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote:
> On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > > I also wonder whether we may be overengineering things here. Perhaps
> > > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > > Well, this was the very first patch suggested.  I would be fine with 
> > > > that,
> > > > but again, out-of-tree code seems to be not quite fine with that local
> > > > solution.
> 
> > > In our particular case, we could possibly use a similar approach. In
> > > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > > sx_sleep() has any locks held. It is easy to verify that all callers of
> > > lf_advlock() are safe in this respect, but this kind of auditing is
> > > generally hard. In fact, I believe the sx_sleep that led to the problem
> > > described in D2612 is the same as the one in my case. That is, the
> > > sleeping thread may or may not hold a vnode lock depending on context.
> 
> > I do not think that in-tree code sleeps with a vnode lock held in
> > the lf_advlock().  Otherwise, system would hang in lock cascade by
> > an attempt to obtain an advisory lock.  I think we can even assert
> > this with witness.
> 
> > There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> > called from vgone(). This sleep indeed occurs under the vnode lock, and
> > as such must be non-suspendable. The sleep waits until other threads
> > leave the lf_advlock() for the reclaimed vnode, and they should leave in
> > deterministic time due to issued wakeups.  So this sleep is exempt from
> > the considerations, and TDF_SBDRY there is correct.
> 
> > I am fine with either the braces around sx_sleep() in lf_advlock() to
> > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> > understanding is that you prefer the later. If I do not mis-represent
> > your position, I understand why you do prefer that.
> 
> The TDF_SRESTART change does fix some more problems such as umount -f
> getting stuck in lf_purgelocks().
> 
> However, it introduces some subtle issues that may not necessarily be a
> sufficient objection.
I did not see an objection in the notes below, but rather I read them
as an argument to return EINTR from the stop attempts from lf_advlock().

> 
> Firstly, adding this closes the door on fixing signal handling for
> fcntl(F_SETLKW). Per POSIX, any caught signal interrupts
> fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux
> man page documents the same. Our man page has documented that SA_RESTART
> behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This
> could normally be fixed via  if (error == ERESTART) error = EINTR;  but
> that is no longer possible if there are [ERESTART] errors that should
> still restart.
I added the translation to fcntl handler.

> 
> Secondly, fcntl(F_SETLKW) restarting after a stop may actually be
> observable, contrary to what I wrote before. This is due to the fair
> queuing. Suppose thread A has locked byte 1 a while ago and thread B is
> trying to lock byte 1 and 2 right now. Then thread C will be able to
> lock byte 2 iff thread B has not blocked yet. If thread C will not be
> allowed to lock byte 2 and will block on it, the TDF_SRESTART change
> will cause it to be awakened if thread B is stopped. When thread B
> resumes, the region to be locked will be recomputed. This scenario
> unambiguously violates the POSIX requirement but I don't know how bad it
> is.
Indeed.

> 
> Note that all these threads must be in separate processes because of
> fcntl locks' strange semantics.

So below is the next level of over-engineering the stuff. I made it
unified on sigdeferstop() to avoid blowing up the KPI. I am not sure what
modes are needed by onefs, so I added SIGDEFERSTOP_ERESTART despite it
is not used by the kernel.

lf_advlock() is not stoppable at all, with EINTR return.  In the previous
patches, only if the caller of lf_advlock() already set TDF_SBDRY, the
function modified the behaviour.

I considered adding td_sbdry member to struct thread for managing sbdry
stops, but did not for now.  If one more flag appear to be used, I will
change that.

diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c
index 716faa3..fb3eb90 100644
--- a/sys/fs/fifofs/fifo_vnops.c
+++ b/sys/fs/fifofs/fifo_vnops.c
@@ -194,11 +194,10 @@ fifo_open(ap)
if ((ap->a_mode & FREAD) && fip->fi_writers == 0) {
gen = fip->fi_wgen;
VOP_UNLOCK(vp, 0);
-   stops_deferred = sigallowstop();
+   stops_deferred = sigdeferstop(SIGDEFERSTOP_OFF);
  

Re: thread suspension when dumping core

2016-06-08 Thread Jilles Tjoelker
On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > I also wonder whether we may be overengineering things here. Perhaps
> > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > Well, this was the very first patch suggested.  I would be fine with that,
> > > but again, out-of-tree code seems to be not quite fine with that local
> > > solution.

> > In our particular case, we could possibly use a similar approach. In
> > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > sx_sleep() has any locks held. It is easy to verify that all callers of
> > lf_advlock() are safe in this respect, but this kind of auditing is
> > generally hard. In fact, I believe the sx_sleep that led to the problem
> > described in D2612 is the same as the one in my case. That is, the
> > sleeping thread may or may not hold a vnode lock depending on context.

> I do not think that in-tree code sleeps with a vnode lock held in
> the lf_advlock().  Otherwise, system would hang in lock cascade by
> an attempt to obtain an advisory lock.  I think we can even assert
> this with witness.

> There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> called from vgone(). This sleep indeed occurs under the vnode lock, and
> as such must be non-suspendable. The sleep waits until other threads
> leave the lf_advlock() for the reclaimed vnode, and they should leave in
> deterministic time due to issued wakeups.  So this sleep is exempt from
> the considerations, and TDF_SBDRY there is correct.

> I am fine with either the braces around sx_sleep() in lf_advlock() to
> clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> understanding is that you prefer the later. If I do not mis-represent
> your position, I understand why you do prefer that.

The TDF_SRESTART change does fix some more problems such as umount -f
getting stuck in lf_purgelocks().

However, it introduces some subtle issues that may not necessarily be a
sufficient objection.

Firstly, adding this closes the door on fixing signal handling for
fcntl(F_SETLKW). Per POSIX, any caught signal interrupts
fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux
man page documents the same. Our man page has documented that SA_RESTART
behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This
could normally be fixed via  if (error == ERESTART) error = EINTR;  but
that is no longer possible if there are [ERESTART] errors that should
still restart.

Secondly, fcntl(F_SETLKW) restarting after a stop may actually be
observable, contrary to what I wrote before. This is due to the fair
queuing. Suppose thread A has locked byte 1 a while ago and thread B is
trying to lock byte 1 and 2 right now. Then thread C will be able to
lock byte 2 iff thread B has not blocked yet. If thread C will not be
allowed to lock byte 2 and will block on it, the TDF_SRESTART change
will cause it to be awakened if thread B is stopped. When thread B
resumes, the region to be locked will be recomputed. This scenario
unambiguously violates the POSIX requirement but I don't know how bad it
is.

Note that all these threads must be in separate processes because of
fcntl locks' strange semantics.

-- 
Jilles Tjoelker
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-08 Thread Mark Johnston
On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > I also wonder whether we may be overengineering things here. Perhaps
> > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > Well, this was the very first patch suggested.  I would be fine with that,
> > > but again, out-of-tree code seems to be not quite fine with that local
> > > solution.
> > 
> > In our particular case, we could possibly use a similar approach. In
> > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > sx_sleep() has any locks held. It is easy to verify that all callers of
> > lf_advlock() are safe in this respect, but this kind of auditing is
> > generally hard. In fact, I believe the sx_sleep that led to the problem
> > described in D2612 is the same as the one in my case. That is, the
> > sleeping thread may or may not hold a vnode lock depending on context.
> 
> I do not think that in-tree code sleeps with a vnode lock held in
> the lf_advlock().  Otherwise, system would hang in lock cascade by
> an attempt to obtain an advisory lock.  I think we can even assert
> this with witness.

Indeed. I just meant that this appears to not be true of Isilon's
FS locking code, which is parameterized heavily by the lock type.

> 
> There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> called from vgone(). This sleep indeed occurs under the vnode lock, and
> as such must be non-suspendable. The sleep waits until other threads
> leave the lf_advlock() for the reclaimed vnode, and they should leave in
> deterministic time due to issued wakeups.  So this sleep is exempt from
> the considerations, and TDF_SBDRY there is correct.
> 
> I am fine with either the braces around sx_sleep() in lf_advlock() to
> clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> understanding is that you prefer the later. If I do not mis-represent
> your position, I understand why you do prefer that.

Right, I think it would be hard for me to adopt a solution based on the
proposed lf_advlock() change. So I prefer the latter, more general
approach, additional complexity notwithstanding.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-08 Thread Konstantin Belousov
On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > I also wonder whether we may be overengineering things here. Perhaps
> > > the advlock sleep can simply turn off TDF_SBDRY.
> > Well, this was the very first patch suggested.  I would be fine with that,
> > but again, out-of-tree code seems to be not quite fine with that local
> > solution.
> 
> In our particular case, we could possibly use a similar approach. In
> general, it seems incorrect to clear TDF_SBDRY if the thread calling
> sx_sleep() has any locks held. It is easy to verify that all callers of
> lf_advlock() are safe in this respect, but this kind of auditing is
> generally hard. In fact, I believe the sx_sleep that led to the problem
> described in D2612 is the same as the one in my case. That is, the
> sleeping thread may or may not hold a vnode lock depending on context.

I do not think that in-tree code sleeps with a vnode lock held in
the lf_advlock().  Otherwise, system would hang in lock cascade by
an attempt to obtain an advisory lock.  I think we can even assert
this with witness.

There is another sleep, which Jilles mentioned, in lf_purgelocks(),
called from vgone(). This sleep indeed occurs under the vnode lock, and
as such must be non-suspendable. The sleep waits until other threads
leave the lf_advlock() for the reclaimed vnode, and they should leave in
deterministic time due to issued wakeups.  So this sleep is exempt from
the considerations, and TDF_SBDRY there is correct.

I am fine with either the braces around sx_sleep() in lf_advlock() to
clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
understanding is that you prefer the later. If I do not mis-represent
your position, I understand why you do prefer that.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-08 Thread Mark Johnston
On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > In this case it is clear which sleep(9) calls should be affected so it
> > may be better to avoid more hidden state.
> In this case yes, but apparently some out-of-tree users exist. And,
> the marking of the single sx_sleep() call depends on knowing our
> implementation. I remember that as the arguments to change from PBDRY to
> the current state setter, in NFS it is not too hard to try to enumerate
> interruptible sleeps. 
> 
> > 
> > I also wonder whether we may be overengineering things here. Perhaps
> > the advlock sleep can simply turn off TDF_SBDRY.
> Well, this was the very first patch suggested.  I would be fine with that,
> but again, out-of-tree code seems to be not quite fine with that local
> solution.

In our particular case, we could possibly use a similar approach. In
general, it seems incorrect to clear TDF_SBDRY if the thread calling
sx_sleep() has any locks held. It is easy to verify that all callers of
lf_advlock() are safe in this respect, but this kind of auditing is
generally hard. In fact, I believe the sx_sleep that led to the problem
described in D2612 is the same as the one in my case. That is, the
sleeping thread may or may not hold a vnode lock depending on context.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-07 Thread Konstantin Belousov
On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote:
> > On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote:
> > > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> > > > This looks as if we should not ignore suspension requests in
> > > > thread_suspend_check() completely in TDF_SBDRY case, but return either
> > > > EINTR or ERESTART (most likely ERESTART). Note that the goal of
> > > > TDF_SBDRY is to avoid suspending in the protected region, not to make an
> > > > impression that the suspension does not occur at all.
> 
> > > This looks like it would revert r246417 and re-introduce the bug fixed
> > > by it (unexpected [EINTR] and short reads/writes after stop signals).
> 
> > Well, the patch returns ERESTART and not EINTR, so the syscall should
> > be retried after all the unwinding.
> 
> That fixes the [EINTR] part of the problem but not short reads and
> writes.
Which are IMO fine, but I understand that there is too many programs which
are not aware.

> 
> > > After r246417, TDF_SBDRY is intended for sleeps that occur while holding
> > > resources such as vnode locks and are normally short but should be
> > > interruptible by fatal signals because they may occasionally be
> > > indefinitely long (such as a non-responsive NFS server).
> 
> > > It looks like yet another kind of sleep may be required, since advisory
> > > locks still hold some filesystem resources across the sleep (though not
> > > vnode locks).
> 
> > I do not think that adv locks enter sleep with any resource held which
> > would block other threads.  But I agree with the statement because the
> > lock might be granted and then the stopped thread would appear to own
> > the blocking resource.
> 
> It does not hold any resource used by normal operations, but it blocks a
> forced unmount (umount -f hangs in [purgelocks] with tmpfs in a recent
> stable/10).
What I mean, is that when the lock is granted during sleep, other threads
might (or might not) block on the lock edge coming from the suspended
thread.  Even if they do not block, it is too fine implementation detail.

> 
> If queuing is supposed to be fair, then granting the lock to the stopped
> thread is correct and aborting the sleep with [ERESTART] would break it.
> The kern_lockf.c code seems to go to some lengths to make queuing fair.
> This does not seem very important, though.
> 
> Also, restarting a locking call violates some text in POSIX XSH's fcntl
> page that the range of bytes to be locked shall be determined before the
> thread blocks (this may be affected by the current seek offset and the
> file size). I don't know whether violating this will break any
> applications. The text has the problem that there is no way to
> distinguish between a thread that is in fcntl() and has not yet blocked
> and a thread that has blocked, even though it seems intuitively clear.
Such race is self-inflicted IMO. The app assumes that it can call
blockable fcntl() and then change file offset, in other thread. How
would other thread determine that the blockable thread is blocked ?

> 
> > > We then have four kinds:
> 
> > > * uninterruptible by signals, ignores stops (default)
> > > * interruptible by signals, ignores stops (current TDF_SBDRY with
> > >   PCATCH)
> > > * interruptible by signals, freezes in place on stops (avoids
> > >   unexpected short I/O) (current PCATCH, otherwise)
> > > * interruptible by signals, fails with [ERESTART] on stops (avoids
> > >   holding resources across a stop) (new)
> 
> > > The new kind of sleep would fail with [ERESTART] only for stops, since
> > > [EINTR] should only be returned if a signal handler was called. There
> > > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a
> > > handler does not stop the process.
> 
> > And where would this new kind of sleep used ?  The advlock sleep is the one
> > place.  Does fifo sleep for reader or writer on open require this kind
> > of handling (IMO no) ?
> 
> > I think this can be relatively easily implemented with either a flag
> > for XXXsleep(9) (my older style of PBDRY) or using only the thread flag
> > (jhb' newer TDF_SBDRY approach). Probably the later should be used, for
> > consistency and easier marking of larger blocks of code.
> 
> In this case it is clear which sleep(9) calls should be affected so it
> may be better to avoid more hidden state.
In this case yes, but apparently some out-of-tree users exist. And,
the marking of the single sx_sleep() call depends on knowing our
implementation. I remember that as the arguments to change from PBDRY to
the current state setter, in NFS it is not too hard to try to enumerate
interruptible sleeps. 

> 
> I also wonder whether we may be overengineering things here. Perhaps
> the advlock sleep can simply turn off TDF_SBDRY.
Well, this was the very first patch suggested.  I would be fine with that,
but again, out-of-tree co

Re: thread suspension when dumping core

2016-06-07 Thread Jilles Tjoelker
On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote:
> On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote:
> > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> > > This looks as if we should not ignore suspension requests in
> > > thread_suspend_check() completely in TDF_SBDRY case, but return either
> > > EINTR or ERESTART (most likely ERESTART). Note that the goal of
> > > TDF_SBDRY is to avoid suspending in the protected region, not to make an
> > > impression that the suspension does not occur at all.

> > This looks like it would revert r246417 and re-introduce the bug fixed
> > by it (unexpected [EINTR] and short reads/writes after stop signals).

> Well, the patch returns ERESTART and not EINTR, so the syscall should
> be retried after all the unwinding.

That fixes the [EINTR] part of the problem but not short reads and
writes.

> > After r246417, TDF_SBDRY is intended for sleeps that occur while holding
> > resources such as vnode locks and are normally short but should be
> > interruptible by fatal signals because they may occasionally be
> > indefinitely long (such as a non-responsive NFS server).

> > It looks like yet another kind of sleep may be required, since advisory
> > locks still hold some filesystem resources across the sleep (though not
> > vnode locks).

> I do not think that adv locks enter sleep with any resource held which
> would block other threads.  But I agree with the statement because the
> lock might be granted and then the stopped thread would appear to own
> the blocking resource.

It does not hold any resource used by normal operations, but it blocks a
forced unmount (umount -f hangs in [purgelocks] with tmpfs in a recent
stable/10).

If queuing is supposed to be fair, then granting the lock to the stopped
thread is correct and aborting the sleep with [ERESTART] would break it.
The kern_lockf.c code seems to go to some lengths to make queuing fair.
This does not seem very important, though.

Also, restarting a locking call violates some text in POSIX XSH's fcntl
page that the range of bytes to be locked shall be determined before the
thread blocks (this may be affected by the current seek offset and the
file size). I don't know whether violating this will break any
applications. The text has the problem that there is no way to
distinguish between a thread that is in fcntl() and has not yet blocked
and a thread that has blocked, even though it seems intuitively clear.

> > We then have four kinds:

> > * uninterruptible by signals, ignores stops (default)
> > * interruptible by signals, ignores stops (current TDF_SBDRY with
> >   PCATCH)
> > * interruptible by signals, freezes in place on stops (avoids
> >   unexpected short I/O) (current PCATCH, otherwise)
> > * interruptible by signals, fails with [ERESTART] on stops (avoids
> >   holding resources across a stop) (new)

> > The new kind of sleep would fail with [ERESTART] only for stops, since
> > [EINTR] should only be returned if a signal handler was called. There
> > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a
> > handler does not stop the process.

> And where would this new kind of sleep used ?  The advlock sleep is the one
> place.  Does fifo sleep for reader or writer on open require this kind
> of handling (IMO no) ?

> I think this can be relatively easily implemented with either a flag
> for XXXsleep(9) (my older style of PBDRY) or using only the thread flag
> (jhb' newer TDF_SBDRY approach). Probably the later should be used, for
> consistency and easier marking of larger blocks of code.

In this case it is clear which sleep(9) calls should be affected so it
may be better to avoid more hidden state.

I also wonder whether we may be overengineering things here. Perhaps
the advlock sleep can simply turn off TDF_SBDRY.

-- 
Jilles Tjoelker
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-07 Thread Konstantin Belousov
On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote:
> On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote:
> > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> > > This looks as if we should not ignore suspension requests in
> > > thread_suspend_check() completely in TDF_SBDRY case, but return either
> > > EINTR or ERESTART (most likely ERESTART). Note that the goal of
> > > TDF_SBDRY is to avoid suspending in the protected region, not to make an
> > > impression that the suspension does not occur at all.
> > 
> > This looks like it would revert r246417 and re-introduce the bug fixed
> > by it (unexpected [EINTR] and short reads/writes after stop signals).
> Well, the patch returns ERESTART and not EINTR, so the syscall should
> be retried after all the unwinding.
> 
> > 
> > After r246417, TDF_SBDRY is intended for sleeps that occur while holding
> > resources such as vnode locks and are normally short but should be
> > interruptible by fatal signals because they may occasionally be
> > indefinitely long (such as a non-responsive NFS server).
> > 
> > It looks like yet another kind of sleep may be required, since advisory
> > locks still hold some filesystem resources across the sleep (though not
> > vnode locks).
> I do not think that adv locks enter sleep with any resource held which
> would block other threads.  But I agree with the statement because the
> lock might be granted and then the stopped thread would appear to own
> the blocking resource.
> 
> > 
> > We then have four kinds:
> > 
> > * uninterruptible by signals, ignores stops (default)
> > * interruptible by signals, ignores stops (current TDF_SBDRY with
> >   PCATCH)
> > * interruptible by signals, freezes in place on stops (avoids
> >   unexpected short I/O) (current PCATCH, otherwise)
> > * interruptible by signals, fails with [ERESTART] on stops (avoids
> >   holding resources across a stop) (new)
> > 
> > The new kind of sleep would fail with [ERESTART] only for stops, since
> > [EINTR] should only be returned if a signal handler was called. There
> > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a
> > handler does not stop the process.
> > 
> And where would this new kind of sleep used ?  The advlock sleep is the one
> place.  Does fifo sleep for reader or writer on open require this kind
> of handling (IMO no) ?
> 
> I think this can be relatively easily implemented with either a flag
> for XXXsleep(9) (my older style of PBDRY) or using only the thread flag
> (jhb' newer TDF_SBDRY approach). Probably the later should be used, for
> consistency and easier marking of larger blocks of code.

Like this.

diff --git a/sys/kern/kern_lockf.c b/sys/kern/kern_lockf.c
index a0a3789..ee26596 100644
--- a/sys/kern/kern_lockf.c
+++ b/sys/kern/kern_lockf.c
@@ -1378,7 +1378,7 @@ lf_setlock(struct lockf *state, struct lockf_entry *lock, 
struct vnode *vp,
 void **cookiep)
 {
static char lockstr[] = "lockf";
-   int priority, error;
+   int error, priority, stoprestart;
 
 #ifdef LOCKF_DEBUG
if (lockf_debug & 1)
@@ -1466,7 +1466,10 @@ lf_setlock(struct lockf *state, struct lockf_entry 
*lock, struct vnode *vp,
}
 
lock->lf_refs++;
+   stoprestart = sigstoprestart();
error = sx_sleep(lock, &state->ls_lock, priority, lockstr, 0);
+   if (stoprestart)
+   sigstopnormal();
if (lf_free_lock(lock)) {
error = EDOOFUS;
goto out;
diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
index 75a1259..1d7036d 100644
--- a/sys/kern/kern_sig.c
+++ b/sys/kern/kern_sig.c
@@ -2633,6 +2633,35 @@ sigallowstop(void)
return (prev);
 }
 
+int
+sigstoprestart(void)
+{
+   struct thread *td;
+
+   td = curthread;
+   if ((td->td_flags & TDF_SBDRY) == 0 ||
+   (td->td_flags & TDF_SRESTART) != 0)
+   return (0);
+   thread_lock(td);
+   td->td_flags |= TDF_SRESTART;
+   thread_unlock(td);
+   return (1);
+}
+
+int
+sigstopnormal(void)
+{
+   struct thread *td;
+   int prev;
+
+   td = curthread;
+   thread_lock(td);
+   prev = (td->td_flags & TDF_SRESTART) != 0;
+   td->td_flags &= ~TDF_SRESTART;
+   thread_unlock(td);
+   return (prev);
+}
+
 /*
  * If the current process has received a signal (should be caught or cause
  * termination, should interrupt current syscall), return the signal number.
diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c
index 9af377e..6460ae9 100644
--- a/sys/kern/kern_thread.c
+++ b/sys/kern/kern_thread.c
@@ -932,7 +932,8 @@ thread_suspend_check(int return_instead)
if ((td->td_flags & TDF_SBDRY) != 0) {
KASSERT(return_instead,
("TDF_SBDRY set for unsafe thread_suspend_check"));
-   return (0);
+   

Re: thread suspension when dumping core

2016-06-07 Thread Konstantin Belousov
On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote:
> On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> > This looks as if we should not ignore suspension requests in
> > thread_suspend_check() completely in TDF_SBDRY case, but return either
> > EINTR or ERESTART (most likely ERESTART). Note that the goal of
> > TDF_SBDRY is to avoid suspending in the protected region, not to make an
> > impression that the suspension does not occur at all.
> 
> This looks like it would revert r246417 and re-introduce the bug fixed
> by it (unexpected [EINTR] and short reads/writes after stop signals).
Well, the patch returns ERESTART and not EINTR, so the syscall should
be retried after all the unwinding.

> 
> After r246417, TDF_SBDRY is intended for sleeps that occur while holding
> resources such as vnode locks and are normally short but should be
> interruptible by fatal signals because they may occasionally be
> indefinitely long (such as a non-responsive NFS server).
> 
> It looks like yet another kind of sleep may be required, since advisory
> locks still hold some filesystem resources across the sleep (though not
> vnode locks).
I do not think that adv locks enter sleep with any resource held which
would block other threads.  But I agree with the statement because the
lock might be granted and then the stopped thread would appear to own
the blocking resource.

> 
> We then have four kinds:
> 
> * uninterruptible by signals, ignores stops (default)
> * interruptible by signals, ignores stops (current TDF_SBDRY with
>   PCATCH)
> * interruptible by signals, freezes in place on stops (avoids
>   unexpected short I/O) (current PCATCH, otherwise)
> * interruptible by signals, fails with [ERESTART] on stops (avoids
>   holding resources across a stop) (new)
> 
> The new kind of sleep would fail with [ERESTART] only for stops, since
> [EINTR] should only be returned if a signal handler was called. There
> cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a
> handler does not stop the process.
> 
And where would this new kind of sleep used ?  The advlock sleep is the one
place.  Does fifo sleep for reader or writer on open require this kind
of handling (IMO no) ?

I think this can be relatively easily implemented with either a flag
for XXXsleep(9) (my older style of PBDRY) or using only the thread flag
(jhb' newer TDF_SBDRY approach). Probably the later should be used, for
consistency and easier marking of larger blocks of code.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-07 Thread Jilles Tjoelker
On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote:
> > Sure, see below. For reference:

> > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | 
> > NEEDSUSPCHK
> > td_pflags = 0
> > td_inhibitors = 0x2 = SLEEPING
> > td_locks = 0

> > stack:
> > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d 
> > ...

> > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS
> > p_flag2 = 0

> > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the
> > SLEEPABORT flag is not, so the thread can not have been sleeping when
> > thread_single() was called - else sleepq_abort() would have been
> > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in
> > sleepq_catch_signals(), and no signal was pending, so we called
> > thread_suspend_check(), which returned 0 because of SBDRY. So we went to
> > sleep.

> > I note that this couldn't have happened prior to r283320. That change
> > was apparently motivated by a similar hang, but in that case the thread
> > was suspended (with a vnode lock held) rather than asleep. It looks like
> > our internal fix also added a change to set TDF_SBDRY around
> > filesystem-specific syscalls, which often sleep interruptibly while
> > holding vnode locks. But I don't think that's the problem here, as you
> > noted with lf_advlock().

> > With r283320 reverted, P_STOPPED_SIG would not have been set, so
> > thread_suspend_check() would have suspended us, allowing the core dump
> > to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping
> > would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so
> > this is probably the wrong place to be solving the problem.

> This looks as if we should not ignore suspension requests in
> thread_suspend_check() completely in TDF_SBDRY case, but return either
> EINTR or ERESTART (most likely ERESTART). Note that the goal of
> TDF_SBDRY is to avoid suspending in the protected region, not to make an
> impression that the suspension does not occur at all.

This looks like it would revert r246417 and re-introduce the bug fixed
by it (unexpected [EINTR] and short reads/writes after stop signals).

After r246417, TDF_SBDRY is intended for sleeps that occur while holding
resources such as vnode locks and are normally short but should be
interruptible by fatal signals because they may occasionally be
indefinitely long (such as a non-responsive NFS server).

It looks like yet another kind of sleep may be required, since advisory
locks still hold some filesystem resources across the sleep (though not
vnode locks).

We then have four kinds:

* uninterruptible by signals, ignores stops (default)
* interruptible by signals, ignores stops (current TDF_SBDRY with
  PCATCH)
* interruptible by signals, freezes in place on stops (avoids
  unexpected short I/O) (current PCATCH, otherwise)
* interruptible by signals, fails with [ERESTART] on stops (avoids
  holding resources across a stop) (new)

The new kind of sleep would fail with [ERESTART] only for stops, since
[EINTR] should only be returned if a signal handler was called. There
cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a
handler does not stop the process.

-- 
Jilles Tjoelker
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-06 Thread Mark Johnston
On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote:
> > Sure, see below. For reference:
> > 
> > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | 
> > NEEDSUSPCHK
> > td_pflags = 0
> > td_inhibitors = 0x2 = SLEEPING
> > td_locks = 0
> > 
> > stack:
> > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d 
> > ...
> > 
> > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS
> > p_flag2 = 0
> > 
> > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the
> > SLEEPABORT flag is not, so the thread can not have been sleeping when
> > thread_single() was called - else sleepq_abort() would have been
> > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in
> > sleepq_catch_signals(), and no signal was pending, so we called
> > thread_suspend_check(), which returned 0 because of SBDRY. So we went to
> > sleep.
> 
> This looks as if we should not ignore suspension requests in
> thread_suspend_check() completely in TDF_SBDRY case, but return either
> EINTR or ERESTART (most likely ERESTART). Note that the goal of
> TDF_SBDRY is to avoid suspending in the protected region, not to make an
> impression that the suspension does not occur at all.

Thanks, that sounds right to me. It results in unified behaviour for
TDF_SBDRY regardless of whether the suspension attempt took place before
or after the thread went to sleep, and seems like it does the right
thing in the single-threaded case as well.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-06 Thread Konstantin Belousov
On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote:
> Sure, see below. For reference:
> 
> td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK
> td_pflags = 0
> td_inhibitors = 0x2 = SLEEPING
> td_locks = 0
> 
> stack:
> mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d 
> ...
> 
> p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS
> p_flag2 = 0
> 
> The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the
> SLEEPABORT flag is not, so the thread can not have been sleeping when
> thread_single() was called - else sleepq_abort() would have been
> invoked and set SLEEPABORT. We are at the second sleepq_switch() call in
> sleepq_catch_signals(), and no signal was pending, so we called
> thread_suspend_check(), which returned 0 because of SBDRY. So we went to
> sleep.
> 
> I note that this couldn't have happened prior to r283320. That change
> was apparently motivated by a similar hang, but in that case the thread
> was suspended (with a vnode lock held) rather than asleep. It looks like
> our internal fix also added a change to set TDF_SBDRY around
> filesystem-specific syscalls, which often sleep interruptibly while
> holding vnode locks. But I don't think that's the problem here, as you
> noted with lf_advlock().
> 
> With r283320 reverted, P_STOPPED_SIG would not have been set, so
> thread_suspend_check() would have suspended us, allowing the core dump
> to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping
> would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so
> this is probably the wrong place to be solving the problem.

This looks as if we should not ignore suspension requests in
thread_suspend_check() completely in TDF_SBDRY case, but return either
EINTR or ERESTART (most likely ERESTART). Note that the goal of
TDF_SBDRY is to avoid suspending in the protected region, not to make an
impression that the suspension does not occur at all.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-06 Thread Mark Johnston
On Tue, Jun 07, 2016 at 05:46:10AM +0300, Konstantin Belousov wrote:
> On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote:
> > On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote:
> What statement was not true: that your code sets TDF_SBDRY, or that
> the lf_advlock() function was called ?

The latter. It turns out that we've modified some of our
filesystem-specific syscalls to set TDF_SBDRY as well; see below.

> > I'm a bit confused by this. How does TDF_SBDRY prevent thread_single()
> > from waking up the thread? The sleepq_abort() call is only elided in the
> > SINGLE_ALLPROC case, so in other cases, I think we will still interrupt
> > the sleep. Thus, since thread_suspend_check() is only invoked prior to
> > going to sleep, the application I referred to must have attempted to
> > single-thread the process before the thread in question went to sleep.
> It does not prevent the wakeup, sorry.
> 
> What I should have said, more precisely, is that thread_suspend_check()
> call before the thread is goes to sleep, is nop in case of TDF_SBDRY
> flag was set.

Ok, that's my understanding too.

> > > From what I see in the code, our NFS client has similar issue of calling
> > > lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
> > > Similar bug existed in our fifofs, see r277321.
> > 
> > Thanks. It may be that a similar fix is appropriate in our locking code,
> > but I'll have to spend more time reading it.
> 
> Still, I am confused now as well.  If you can catch the process in that
> state, where a thread is sleeping while single-threading request cannot
> make the progress, I would like to see the struct thread and struct proc
> printouts.  Esp. the thread flags are interesting.

Sure, see below. For reference:

td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK
td_pflags = 0
td_inhibitors = 0x2 = SLEEPING
td_locks = 0

stack:
mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ...

p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS
p_flag2 = 0

The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the
SLEEPABORT flag is not, so the thread can not have been sleeping when
thread_single() was called - else sleepq_abort() would have been
invoked and set SLEEPABORT. We are at the second sleepq_switch() call in
sleepq_catch_signals(), and no signal was pending, so we called
thread_suspend_check(), which returned 0 because of SBDRY. So we went to
sleep.

I note that this couldn't have happened prior to r283320. That change
was apparently motivated by a similar hang, but in that case the thread
was suspended (with a vnode lock held) rather than asleep. It looks like
our internal fix also added a change to set TDF_SBDRY around
filesystem-specific syscalls, which often sleep interruptibly while
holding vnode locks. But I don't think that's the problem here, as you
noted with lf_advlock().

With r283320 reverted, P_STOPPED_SIG would not have been set, so
thread_suspend_check() would have suspended us, allowing the core dump
to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping
would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so
this is probably the wrong place to be solving the problem.

CPU IDFUNCTION:NAME
  1  1   :BEGIN struct thread {
struct mtx *volatile td_lock = 0x814a5148
struct proc *td_proc = 0xf80445c694d8
struct td_plist = {
struct thread *tqe_next = 0xf81a44186750
struct thread **tqe_prev = 0xf8047a027760
}
struct td_runq = {
struct thread *tqe_next = 0
struct thread **tqe_prev = 0x8147cd40
}
struct td_slpq = {
struct thread *tqe_next = 0
struct thread **tqe_prev = 0xf81c992b1d80
}
struct td_lockq = {
struct thread *tqe_next = 0
struct thread **tqe_prev = 0xfe2a1c727be8
}
struct td_hash = {
struct thread *le_next = 0
struct thread **le_prev = 0xfe0001076f98
}
struct cpuset *td_cpuset = 0xf80699b9b510
struct seltd *td_sel = 0xf80258307b80
struct sleepqueue *td_sleepqueue = 0
struct turnstile *td_turnstile = 0xf80d50a3fcc0
struct rl_q_entry *td_rlqe = 0xf8025dcf5988
struct umtx_q *td_umtxq = 0xf807c9adfd80
lwpid_t td_tid = 0x18df3
sigqueue_t td_sigqueue = {
sigset_t sq_signals = {
__uint32_t [4] __bits = [ 0, 0, 0, 0 ]
}
sigset_t sq_kill = {
__uint32_t [4] __bits = [ 0, 0, 0, 0 ]
}
struct sq_list = {
struct ksiginfo *tqh_first = 0
struct ksiginfo **tqh_last = 0xf814be2df808
}
struct proc *sq_proc = 0xf80445c694d8
int sq_flags = 0x1
}
u_char td_lend_user_pri = 0xff
int td_flags = 0xa84c
int td_inhibitors = 0x2
int td_pflags = 0
int td_dupfd = 0

Re: thread suspension when dumping core

2016-06-06 Thread Conrad Meyer
On Mon, Jun 6, 2016 at 7:46 PM, Konstantin Belousov  wrote:
> On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote:
>> On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote:
>> > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?
>>
>> It doesn't. This code belongs to a general framework for distributed FS
>> locks; in this particular case, the application was using it to acquire
>> a custom advisory lock.
> What statement was not true: that your code sets TDF_SBDRY, or that
> the lf_advlock() function was called ?

We set TDF_SBDRY; we don't call lf_advlock().

I don't have the core yet, so I can't get the thread information, sorry.

Best,
Conrad
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-06 Thread Konstantin Belousov
On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote:
> On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote:
> > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?
> 
> It doesn't. This code belongs to a general framework for distributed FS
> locks; in this particular case, the application was using it to acquire
> a custom advisory lock.
What statement was not true: that your code sets TDF_SBDRY, or that
the lf_advlock() function was called ?

> 
> > This cannot work, regardless of the mode of single-threading.  TDF_SBDRY
> > makes such sleep non-interruptible by any single-threading request, on
> > the promise that the thread owns some resources (typically vnode locks).
> > I.e. changing the mode would not help.
> 
> I'm a bit confused by this. How does TDF_SBDRY prevent thread_single()
> from waking up the thread? The sleepq_abort() call is only elided in the
> SINGLE_ALLPROC case, so in other cases, I think we will still interrupt
> the sleep. Thus, since thread_suspend_check() is only invoked prior to
> going to sleep, the application I referred to must have attempted to
> single-thread the process before the thread in question went to sleep.
It does not prevent the wakeup, sorry.

What I should have said, more precisely, is that thread_suspend_check()
call before the thread is goes to sleep, is nop in case of TDF_SBDRY
flag was set.

> 
> > 
> > I see two reasons to use SINGLE_NO_EXIT for coredumping.  It allows
> > coredump writer to record more exact state of the process into the notes.
> > 
> > Another one is that SINGLE_NO_EXIT is generally faster and more
> > reliable than SINGLE_BOUNDARY. Some states are already good enough for
> > SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In
> > other words, core dump write starts earlier.
> > 
> > It might be not very significant reasons. 
> > 
> > From what I see in the code, our NFS client has similar issue of calling
> > lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
> > Similar bug existed in our fifofs, see r277321.
> 
> Thanks. It may be that a similar fix is appropriate in our locking code,
> but I'll have to spend more time reading it.

Still, I am confused now as well.  If you can catch the process in that
state, where a thread is sleeping while single-threading request cannot
make the progress, I would like to see the struct thread and struct proc
printouts.  Esp. the thread flags are interesting.

Thanks.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-06 Thread Mark Johnston
On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote:
> On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote:
> > Hi,
> > 
> > I've recently observed a hang in a multi-threaded process that had hit
> > an assertion failure and was attempting to dump core. One thread was
> > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our
> > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to
> > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails
> > to interrupt the sleeping thread, resulting in the hang.
> > 
> > My question is, why does the SA_CORE handler not force all threads to
> > the user boundary before attempting to dump core? It must do so later
> > anyway in order to exit. As I understand it, TDF_SBDRY is intended to
> > avoid deadlocks that can occur when stopping a process, but in this
> > case we don't stop the process with the intention of resuming it, so it
> > seems erroneous to apply this flag.
> 
> Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?

It doesn't. This code belongs to a general framework for distributed FS
locks; in this particular case, the application was using it to acquire
a custom advisory lock.

> This cannot work, regardless of the mode of single-threading.  TDF_SBDRY
> makes such sleep non-interruptible by any single-threading request, on
> the promise that the thread owns some resources (typically vnode locks).
> I.e. changing the mode would not help.

I'm a bit confused by this. How does TDF_SBDRY prevent thread_single()
from waking up the thread? The sleepq_abort() call is only elided in the
SINGLE_ALLPROC case, so in other cases, I think we will still interrupt
the sleep. Thus, since thread_suspend_check() is only invoked prior to
going to sleep, the application I referred to must have attempted to
single-thread the process before the thread in question went to sleep.

> 
> I see two reasons to use SINGLE_NO_EXIT for coredumping.  It allows
> coredump writer to record more exact state of the process into the notes.
> 
> Another one is that SINGLE_NO_EXIT is generally faster and more
> reliable than SINGLE_BOUNDARY. Some states are already good enough for
> SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In
> other words, core dump write starts earlier.
> 
> It might be not very significant reasons. 
> 
> From what I see in the code, our NFS client has similar issue of calling
> lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
> Similar bug existed in our fifofs, see r277321.

Thanks. It may be that a similar fix is appropriate in our locking code,
but I'll have to spend more time reading it.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: thread suspension when dumping core

2016-06-04 Thread Konstantin Belousov
On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote:
> Hi,
> 
> I've recently observed a hang in a multi-threaded process that had hit
> an assertion failure and was attempting to dump core. One thread was
> sleeping interruptibly on an advisory lock with TDF_SBDRY set (our
> filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to
> suspend other threads with thread_single(SINGLE_NO_EXIT), which fails
> to interrupt the sleeping thread, resulting in the hang.
> 
> My question is, why does the SA_CORE handler not force all threads to
> the user boundary before attempting to dump core? It must do so later
> anyway in order to exit. As I understand it, TDF_SBDRY is intended to
> avoid deadlocks that can occur when stopping a process, but in this
> case we don't stop the process with the intention of resuming it, so it
> seems erroneous to apply this flag.

Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?
This cannot work, regardless of the mode of single-threading.  TDF_SBDRY
makes such sleep non-interruptible by any single-threading request, on
the promise that the thread owns some resources (typically vnode locks).
I.e. changing the mode would not help.

I see two reasons to use SINGLE_NO_EXIT for coredumping.  It allows
coredump writer to record more exact state of the process into the notes.

Another one is that SINGLE_NO_EXIT is generally faster and more
reliable than SINGLE_BOUNDARY. Some states are already good enough for
SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In
other words, core dump write starts earlier.

It might be not very significant reasons. 

>From what I see in the code, our NFS client has similar issue of calling
lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
Similar bug existed in our fifofs, see r277321.

diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c
index 2a8afa9..98625ee 100644
--- a/sys/fs/nfsclient/nfs_clvnops.c
+++ b/sys/fs/nfsclient/nfs_clvnops.c
@@ -2992,7 +2992,7 @@ nfs_advlock(struct vop_advlock_args *ap)
struct proc *p = (struct proc *)ap->a_id;
struct thread *td = curthread;  /* XXX */
struct vattr va;
-   int ret, error = EOPNOTSUPP;
+   int sbdry, ret, error = EOPNOTSUPP;
u_quad_t size;

if (NFS_ISV4(vp) && (ap->a_flags & (F_POSIX | F_FLOCK)) != 0) {
@@ -3087,7 +3087,10 @@ nfs_advlock(struct vop_advlock_args *ap)
if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) {
size = VTONFS(vp)->n_size;
NFSVOPUNLOCK(vp, 0);
+   sbdry = sigallowstop();
error = lf_advlock(ap, &(vp->v_lockf), size);
+   if (sbdry)
+   sigdeferstop();
} else {
if (nfs_advlock_p != NULL)
error = nfs_advlock_p(ap);
@@ -3114,7 +3117,7 @@ nfs_advlockasync(struct vop_advlockasync_args *ap)
 {
struct vnode *vp = ap->a_vp;
u_quad_t size;
-   int error;
+   int error, sbdry;

if (NFS_ISV4(vp))
return (EOPNOTSUPP);
@@ -3124,7 +3127,10 @@ nfs_advlockasync(struct vop_advlockasync_args *ap)
if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) {
size = VTONFS(vp)->n_size;
NFSVOPUNLOCK(vp, 0);
+   sbdry = sigallowstop();
error = lf_advlockasync(ap, &(vp->v_lockf), size);
+   if (sbdry)
+   sigdeferstop();
} else {
NFSVOPUNLOCK(vp, 0);
error = EOPNOTSUPP;
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"