Re: kevent has bug?

2014-04-04 Thread Konstantin Belousov
On Fri, Apr 04, 2014 at 07:55:05AM +0900, Kohji Okuno wrote:
 Hi,
 
 Thank you for your detailed commnet. And I uderstood about your opinion.
 By the way, do you commit your change to HEAD?

Sure I will, I am waiting for possible last comment from John-Mark.
Unless he provides more feedback, I plan to commit the patch tomorrow.


pgpT4NbzyF4Kn.pgp
Description: PGP signature


Re: kevent has bug?

2014-04-03 Thread Kohji Okuno
From: Konstantin Belousov kostik...@gmail.com
Date: Wed, 2 Apr 2014 20:44:00 +0300
 On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote:
 Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300:
 Well, it's not that its preventing waking up the waiter, but failing to
 register the event on the knote because of the _INFLUX flag...
 Yes, I used the wrong terminology.
 
 
  Patch below fixed your test case for me, also tools/regression/kqueue did
  not noticed a breakage.  I tried to describe the situation in the
  comment in knote().  Also, I removed unlocked check for the KN_INFLUX
  in knote, since it seems to be an optimization for rare case, and is
  the race on its own.
 
 Comments below...
 
  diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
  index b3fb23d..380f1ff 100644
  --- a/sys/kern/kern_event.c
  +++ b/sys/kern/kern_event.c
 
 [...]
 
  @@ -1506,7 +1506,7 @@ retry:
 KQ_LOCK(kq);
 kn = NULL;
 } else {
  -  kn-kn_status |= KN_INFLUX;
  +  kn-kn_status |= KN_INFLUX | KN_SCAN;
 KQ_UNLOCK(kq);
 if ((kn-kn_status  KN_KQUEUE) == KN_KQUEUE)
 KQ_GLOBAL_LOCK(kq_global, haskqglobal);
 
 Is there a reason you don't add the KN_SCAN to the other cases in
 kqueue_scan that set the _INFLUX flag?
 Other cases in kqueue_scan() which set influx do the detach and drop,
 so I do not see a need to ensure that note is registered.  Except I missed
 one case, which you pointed out.
 
 
 [...]
 
  @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int 
  lockflags)
  */
 SLIST_FOREACH(kn, list-kl_list, kn_selnext) {
 kq = kn-kn_kq;
  -  if ((kn-kn_status  KN_INFLUX) != KN_INFLUX) {
  +  KQ_LOCK(kq);
  +  if ((kn-kn_status  (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
  +  /*
  +   * Do not process the influx notes, except for
  +   * the influx coming from the kq unlock in the
  +   * kqueue_scan().  In the later case, we do
  +   * not interfere with the scan, since the code
  +   * fragment in kqueue_scan() locks the knlist,
  +   * and cannot proceed until we finished.
  +   */
 
 We might want to add a marker node, and reprocess the list from the
 marker node, because this might introduce other races in the code too...
 but the problem with that is that knote is expected to keep the list
 locked throughout the call if called w/ it already locked, and so we
 can't do that, without major work... :(
 Why ? If the knlist lock is not dropped, I do not see a need for the
 marker.  The patch does not introduce the sleep point for the KN_SCAN
 knotes anyway.
 
 
 I added a similar comment in knote_fork:
  * XXX - Why do we skip the kn if it is _INFLUX?  Does this
  * mean we will not properly wake up some notes?
 
 and it looks like it was true...
 
 So, upon reading the other _INFLUX cases, it looks like we should change
 _SCAN to be, _CHANGING or something similar, and any place we don't end
 up dropping the knote, we set this flag also...  Once such case is at
 the end of kqueue_register, just before the label done_ev_add, where we
 update the knote w/ new udata and other fields..  Or change the logic
 of the flag, and set it for all the cases we are about to drop the
 knote..
 So do you prefer KN_CHANGING instead of KN_SCAN ?  I do not have any
 objections against renaming the flag, but _CHANGING seems to not say
 anything about the flag intent.  I would say that KN_STABLE is more
 useful, or KN_INFLUX_NODEL, or whatever.
 
 The done_ev_add case is indeed missed in my patch, thank you for noting.
 The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the
 race is possible just by the nature of adding the knote.
 
 
  +  KQ_UNLOCK(kq);
  +  } else if ((lockflags  KNF_NOKQLOCK) != 0) {
  +  kn-kn_status |= KN_INFLUX;
  +  KQ_UNLOCK(kq);
  +  error = kn-kn_fop-f_event(kn, hint);
 KQ_LOCK(kq);
 
 I believe we can drop this unlock/lock pair as it's safe to hold the
 KQ lock over f_event, we do that in knote_fork...
 The knote_fork() is for the special kinds of knote only, where we indeed know
 in advance that having the kqueue locked around f_event does not break things.
 
 Updated patch below.
 
 diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
 index b3fb23d..fadb8fd 100644
 --- a/sys/kern/kern_event.c
 +++ b/sys/kern/kern_event.c
 @@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid)
   continue;
   kq = kn-kn_kq;
   KQ_LOCK(kq);
 - if ((kn-kn_status  KN_INFLUX) == KN_INFLUX) {
 + if ((kn-kn_status  (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
 

Re: kevent has bug?

2014-04-03 Thread Konstantin Belousov
On Thu, Apr 03, 2014 at 06:26:56PM +0900, Kohji Okuno wrote:
  The done_ev_add case is indeed missed in my patch, thank you for noting.
  The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the
  race is possible just by the nature of adding the knote.

 I think, we should add KN_SCAN after knote_attach() in
 kqueue_register(), too. What do you think about this?
See above, I noted this case in the previous mail.  This may be elaborated.

First, I think it is technically incorrect to allow the event
notification before the f_attach() method is finished. So the KN_SCAN
flag could be set only after f_attach() call, but due to both kq and
knlist not locked there, we still have the same race. And this race is
in fact acceptable, since it is the race between application calling
EV_ADD, and external event occuring, which cannot be avoided. Until the
kevent(EV_ADD) syscall returned, we do not have an obligation to report
the event from the kqfd. Having the race somewhat bigger by not setting
KN_SCAN is fine in my opinion.


pgp_eM3PUg0OL.pgp
Description: PGP signature


Re: kevent has bug?

2014-04-03 Thread Kohji Okuno
From: Konstantin Belousov kostik...@gmail.com
Date: Thu, 3 Apr 2014 16:48:14 +0300
 On Thu, Apr 03, 2014 at 06:26:56PM +0900, Kohji Okuno wrote:
  The done_ev_add case is indeed missed in my patch, thank you for noting.
  The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the
  race is possible just by the nature of adding the knote.
 
 I think, we should add KN_SCAN after knote_attach() in
 kqueue_register(), too. What do you think about this?
 See above, I noted this case in the previous mail.  This may be elaborated.
 
 First, I think it is technically incorrect to allow the event
 notification before the f_attach() method is finished. So the KN_SCAN
 flag could be set only after f_attach() call, but due to both kq and
 knlist not locked there, we still have the same race. And this race is
 in fact acceptable, since it is the race between application calling
 EV_ADD, and external event occuring, which cannot be avoided. Until the
 kevent(EV_ADD) syscall returned, we do not have an obligation to report
 the event from the kqfd. Having the race somewhat bigger by not setting
 KN_SCAN is fine in my opinion.

Hi,

Thank you for your detailed commnet. And I uderstood about your opinion.
By the way, do you commit your change to HEAD?

Regards,
 Kohji Okuno
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: kevent has bug?

2014-04-02 Thread John-Mark Gurney
Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
 I think, kevent() has a bug.
 I tested sample programs by attached sources.
 This sample tests about EVFILT_SIGNAL.
 
 I build sample programs by the following commands.
 % gcc -O2 -o child child.c
 % gcc -O2 -o parent parent.c
 
 The expected result is the following.
 % ./parent
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
 OK
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
 OK
 
 But, sometimes the result was the following.
 % ./parent
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
 
 This result means the number of times the signal has occured was
 incorrect.

I was able to reproduce this...

 In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
 number of times the signal has occurred since the last call to
 kevent(). This `data' is recorded by filt_signal() (This is f_event in
 struct filterops).
 
 The system call kevent()'s events are processed by kqueue_scan() in
 kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways
 called after KN_INFLUX is set to kn-kn_status.
 
 On the other hand, kernel events are occured by knote() in
 kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
 in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when
 KN_INFLUX is not set in kn-kn_status.
 
 In race condition between kqueue_scan() and knote(),
 kn-kn_fop-f_event() from knote() may not be called, I think.

Considering that both are called w/ a lock, that cannot happen..
KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
by knote...

 In knote(), because the context holds knlist's lock, the context can
 not sleep. So, KN_INFLUX should not be set on calling
 kn-kn_fop-f_event() in kqueue_scan(), I think.

No, it needs to be set:
 * Setting the KN_INFLUX flag enables you to unlock the kq that this knote
 * is on, and modify kn_status as if you had the KQ lock.

As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
the knote will disappear out from under you causing you to dereference
possibly free'd memory..

If you just tried to lock the list lock w/o unlocking the KQ lock, you
could end up w/ a dead lock, as you aren't maintaining lock order
properly..  The correct lock order if knlist - kq...

 What do you think about this issue?

This is a real issue, but not due to the race you described
above...

I have verified on my machine that it isn't because there is a knote
waiting that isn't getting woken up, and the knote on my hung process
has data == 0, so it definately lost one of the signals:
(kgdb) print $14.kq_knhash[20].slh_first[0]
$20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
  kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500,
tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = {
ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0},
  kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8,
p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8,
p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0,
  kn_hookid = 0}

If you want to find this yourself, you can run kgdb on a live system,
switch to the thread of the parent (info threads, thread XXX), and
do:
frame 7

(or a frame that has td, which is struct thread *), then:
print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data

This will give you the struct kqueue * of the parent, and then:
print $XX.kq_knhash[0]@63

to figure out where the knote is in the hash, and then you can print
it out yourself...

I'm going to take a look at this a bit more later... I'm thinking of
using dtrace to collect the stacks where filt_signal is called, and
match them up...  dtrace might even be able to get us the note's data
upon return helping to make sure things got tracked properly...

Thanks for finding this bug!  Hopefully we can find a solution to it..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: kevent has bug?

2014-04-02 Thread Kohji Okuno
From: John-Mark Gurney j...@funkthat.com
Date: Tue, 1 Apr 2014 23:15:51 -0700
 Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
 I think, kevent() has a bug.
 I tested sample programs by attached sources.
 This sample tests about EVFILT_SIGNAL.
 
 I build sample programs by the following commands.
 % gcc -O2 -o child child.c
 % gcc -O2 -o parent parent.c
 
 The expected result is the following.
 % ./parent
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
 OK
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
 OK
 
 But, sometimes the result was the following.
 % ./parent
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
 
 This result means the number of times the signal has occured was
 incorrect.
 
 I was able to reproduce this...
 
 In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
 number of times the signal has occurred since the last call to
 kevent(). This `data' is recorded by filt_signal() (This is f_event in
 struct filterops).
 
 The system call kevent()'s events are processed by kqueue_scan() in
 kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways
 called after KN_INFLUX is set to kn-kn_status.
 
 On the other hand, kernel events are occured by knote() in
 kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
 in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when
 KN_INFLUX is not set in kn-kn_status.
 
 In race condition between kqueue_scan() and knote(),
 kn-kn_fop-f_event() from knote() may not be called, I think.
 
 Considering that both are called w/ a lock, that cannot happen..
 KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
 by knote...
 
 In knote(), because the context holds knlist's lock, the context can
 not sleep. So, KN_INFLUX should not be set on calling
 kn-kn_fop-f_event() in kqueue_scan(), I think.
 
 No, it needs to be set:
  * Setting the KN_INFLUX flag enables you to unlock the kq that this knote
  * is on, and modify kn_status as if you had the KQ lock.
 
 As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
 the knote will disappear out from under you causing you to dereference
 possibly free'd memory..
 
 If you just tried to lock the list lock w/o unlocking the KQ lock, you
 could end up w/ a dead lock, as you aren't maintaining lock order
 properly..  The correct lock order if knlist - kq...
 
 What do you think about this issue?
 
 This is a real issue, but not due to the race you described
 above...

I beleave it's the result of the race.

Could you try to add printf() in knote()?
Please refer to attached patch.


 I have verified on my machine that it isn't because there is a knote
 waiting that isn't getting woken up, and the knote on my hung process
 has data == 0, so it definately lost one of the signals:
 (kgdb) print $14.kq_knhash[20].slh_first[0]
 $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
   kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500,
 tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = {
 ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0},
   kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
 p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8,
 p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8,
 p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0,
   kn_hookid = 0}
 
 If you want to find this yourself, you can run kgdb on a live system,
 switch to the thread of the parent (info threads, thread XXX), and
 do:
 frame 7
 
 (or a frame that has td, which is struct thread *), then:
 print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data
 
 This will give you the struct kqueue * of the parent, and then:
 print $XX.kq_knhash[0]@63
 
 to figure out where the knote is in the hash, and then you can print
 it out yourself...
 
 I'm going to take a look at this a bit more later... I'm thinking of
 using dtrace to collect the stacks where filt_signal is called, and
 match them up...  dtrace might even be able to get us the note's data
 upon return helping to make sure things got tracked properly...
 
 Thanks for finding this bug!  Hopefully we can find a solution to it..
 
 -- 
   John-Mark GurneyVoice: +1 415 225 5579
 
  All that I will do, has been done, All that I have, has not.
diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index b3fb23d..7791447 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags)
 		if ((kn-kn_status  KN_INFLUX) != KN_INFLUX) {
 			KQ_LOCK(kq);
 			if ((kn-kn_status  KN_INFLUX) == KN_INFLUX) {
+if (hint  NOTE_SIGNAL)
+	printf(Aee2\n);
 KQ_UNLOCK(kq);
 			} else if ((lockflags  KNF_NOKQLOCK) != 0) {
 kn-kn_status |= KN_INFLUX;
@@ -1886,6 +1888,10 @@ knote(struct knlist *list, long hint, int lockflags)
 

Re: kevent has bug?

2014-04-02 Thread Konstantin Belousov
On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote:
 From: John-Mark Gurney j...@funkthat.com
 Date: Tue, 1 Apr 2014 23:15:51 -0700
  Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
  I think, kevent() has a bug.
  I tested sample programs by attached sources.
  This sample tests about EVFILT_SIGNAL.
  
  I build sample programs by the following commands.
  % gcc -O2 -o child child.c
  % gcc -O2 -o parent parent.c
  
  The expected result is the following.
  % ./parent
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
  OK
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
  OK
  
  But, sometimes the result was the following.
  % ./parent
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
  
  This result means the number of times the signal has occured was
  incorrect.
  
  I was able to reproduce this...
  
  In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
  number of times the signal has occurred since the last call to
  kevent(). This `data' is recorded by filt_signal() (This is f_event in
  struct filterops).
  
  The system call kevent()'s events are processed by kqueue_scan() in
  kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways
  called after KN_INFLUX is set to kn-kn_status.
  
  On the other hand, kernel events are occured by knote() in
  kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
  in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when
  KN_INFLUX is not set in kn-kn_status.
  
  In race condition between kqueue_scan() and knote(),
  kn-kn_fop-f_event() from knote() may not be called, I think.
  
  Considering that both are called w/ a lock, that cannot happen..
  KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
  by knote...
  
  In knote(), because the context holds knlist's lock, the context can
  not sleep. So, KN_INFLUX should not be set on calling
  kn-kn_fop-f_event() in kqueue_scan(), I think.
  
  No, it needs to be set:
   * Setting the KN_INFLUX flag enables you to unlock the kq that this knote
   * is on, and modify kn_status as if you had the KQ lock.
  
  As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
  the knote will disappear out from under you causing you to dereference
  possibly free'd memory..
  
  If you just tried to lock the list lock w/o unlocking the KQ lock, you
  could end up w/ a dead lock, as you aren't maintaining lock order
  properly..  The correct lock order if knlist - kq...
  
  What do you think about this issue?
  
  This is a real issue, but not due to the race you described
  above...
 
 I beleave it's the result of the race.
 
 Could you try to add printf() in knote()?
 Please refer to attached patch.
 
 
  I have verified on my machine that it isn't because there is a knote
  waiting that isn't getting woken up, and the knote on my hung process
  has data == 0, so it definately lost one of the signals:
  (kgdb) print $14.kq_knhash[20].slh_first[0]
  $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500,
  tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = 
  {
  ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0},
kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
  p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8,
  p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8,
  p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0,
kn_hookid = 0}
  
  If you want to find this yourself, you can run kgdb on a live system,
  switch to the thread of the parent (info threads, thread XXX), and
  do:
  frame 7
  
  (or a frame that has td, which is struct thread *), then:
  print *(struct kqueue 
  *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data
  
  This will give you the struct kqueue * of the parent, and then:
  print $XX.kq_knhash[0]@63
  
  to figure out where the knote is in the hash, and then you can print
  it out yourself...
  
  I'm going to take a look at this a bit more later... I'm thinking of
  using dtrace to collect the stacks where filt_signal is called, and
  match them up...  dtrace might even be able to get us the note's data
  upon return helping to make sure things got tracked properly...
  
  Thanks for finding this bug!  Hopefully we can find a solution to it..
  
  -- 
John-Mark Gurney  Voice: +1 415 225 5579
  
   All that I will do, has been done, All that I have, has not.

 diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
 index b3fb23d..7791447 100644
 --- a/sys/kern/kern_event.c
 +++ b/sys/kern/kern_event.c
 @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags)
   if ((kn-kn_status  KN_INFLUX) != KN_INFLUX) {
   KQ_LOCK(kq);
   if ((kn-kn_status  KN_INFLUX) == KN_INFLUX) {
 + 

Re: kevent has bug?

2014-04-02 Thread Kohji Okuno
From: Konstantin Belousov kostik...@gmail.com
Date: Wed, 2 Apr 2014 15:07:45 +0300
 On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote:
 From: John-Mark Gurney j...@funkthat.com
 Date: Tue, 1 Apr 2014 23:15:51 -0700
  Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
  I think, kevent() has a bug.
  I tested sample programs by attached sources.
  This sample tests about EVFILT_SIGNAL.
  
  I build sample programs by the following commands.
  % gcc -O2 -o child child.c
  % gcc -O2 -o parent parent.c
  
  The expected result is the following.
  % ./parent
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
  OK
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
  OK
  
  But, sometimes the result was the following.
  % ./parent
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
  
  This result means the number of times the signal has occured was
  incorrect.
  
  I was able to reproduce this...
  
  In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
  number of times the signal has occurred since the last call to
  kevent(). This `data' is recorded by filt_signal() (This is f_event in
  struct filterops).
  
  The system call kevent()'s events are processed by kqueue_scan() in
  kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways
  called after KN_INFLUX is set to kn-kn_status.
  
  On the other hand, kernel events are occured by knote() in
  kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
  in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when
  KN_INFLUX is not set in kn-kn_status.
  
  In race condition between kqueue_scan() and knote(),
  kn-kn_fop-f_event() from knote() may not be called, I think.
  
  Considering that both are called w/ a lock, that cannot happen..
  KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
  by knote...
  
  In knote(), because the context holds knlist's lock, the context can
  not sleep. So, KN_INFLUX should not be set on calling
  kn-kn_fop-f_event() in kqueue_scan(), I think.
  
  No, it needs to be set:
   * Setting the KN_INFLUX flag enables you to unlock the kq that this knote
   * is on, and modify kn_status as if you had the KQ lock.
  
  As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
  the knote will disappear out from under you causing you to dereference
  possibly free'd memory..
  
  If you just tried to lock the list lock w/o unlocking the KQ lock, you
  could end up w/ a dead lock, as you aren't maintaining lock order
  properly..  The correct lock order if knlist - kq...
  
  What do you think about this issue?
  
  This is a real issue, but not due to the race you described
  above...
 
 I beleave it's the result of the race.
 
 Could you try to add printf() in knote()?
 Please refer to attached patch.
 
 
  I have verified on my machine that it isn't because there is a knote
  waiting that isn't getting woken up, and the knote on my hung process
  has data == 0, so it definately lost one of the signals:
  (kgdb) print $14.kq_knhash[20].slh_first[0]
  $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500,
  tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent 
  = {
  ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 
  0x0},
kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
  p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8,
  p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8,
  p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0,
kn_hookid = 0}
  
  If you want to find this yourself, you can run kgdb on a live system,
  switch to the thread of the parent (info threads, thread XXX), and
  do:
  frame 7
  
  (or a frame that has td, which is struct thread *), then:
  print *(struct kqueue 
  *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data
  
  This will give you the struct kqueue * of the parent, and then:
  print $XX.kq_knhash[0]@63
  
  to figure out where the knote is in the hash, and then you can print
  it out yourself...
  
  I'm going to take a look at this a bit more later... I'm thinking of
  using dtrace to collect the stacks where filt_signal is called, and
  match them up...  dtrace might even be able to get us the note's data
  upon return helping to make sure things got tracked properly...
  
  Thanks for finding this bug!  Hopefully we can find a solution to it..
  
  -- 
John-Mark Gurney Voice: +1 415 225 5579
  
   All that I will do, has been done, All that I have, has not.
 
 diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
 index b3fb23d..7791447 100644
 --- a/sys/kern/kern_event.c
 +++ b/sys/kern/kern_event.c
 @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags)
  if ((kn-kn_status  KN_INFLUX) != KN_INFLUX) {
  

Re: kevent has bug?

2014-04-02 Thread John-Mark Gurney
Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300:
 On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote:
  From: John-Mark Gurney j...@funkthat.com
  Date: Tue, 1 Apr 2014 23:15:51 -0700
   Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
   I think, kevent() has a bug.
   I tested sample programs by attached sources.
   This sample tests about EVFILT_SIGNAL.
   
   I build sample programs by the following commands.
   % gcc -O2 -o child child.c
   % gcc -O2 -o parent parent.c
   
   The expected result is the following.
   % ./parent
   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
   OK
   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
   OK
   
   But, sometimes the result was the following.
   % ./parent
   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
   
   This result means the number of times the signal has occured was
   incorrect.
   
   I was able to reproduce this...
   
   In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
   number of times the signal has occurred since the last call to
   kevent(). This `data' is recorded by filt_signal() (This is f_event in
   struct filterops).
   
   The system call kevent()'s events are processed by kqueue_scan() in
   kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways
   called after KN_INFLUX is set to kn-kn_status.
   
   On the other hand, kernel events are occured by knote() in
   kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
   in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when
   KN_INFLUX is not set in kn-kn_status.
   
   In race condition between kqueue_scan() and knote(),
   kn-kn_fop-f_event() from knote() may not be called, I think.

Looks like I misunderstood your point, sorry...

   Considering that both are called w/ a lock, that cannot happen..
   KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
   by knote...
   
   In knote(), because the context holds knlist's lock, the context can
   not sleep. So, KN_INFLUX should not be set on calling
   kn-kn_fop-f_event() in kqueue_scan(), I think.
   
   No, it needs to be set:
* Setting the KN_INFLUX flag enables you to unlock the kq that this knote
* is on, and modify kn_status as if you had the KQ lock.
   
   As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
   the knote will disappear out from under you causing you to dereference
   possibly free'd memory..
   
   If you just tried to lock the list lock w/o unlocking the KQ lock, you
   could end up w/ a dead lock, as you aren't maintaining lock order
   properly..  The correct lock order if knlist - kq...
   
   What do you think about this issue?
   
   This is a real issue, but not due to the race you described
   above...
  
  I beleave it's the result of the race.
  
  Could you try to add printf() in knote()?
  Please refer to attached patch.
  
  
   I have verified on my machine that it isn't because there is a knote
   waiting that isn't getting woken up, and the knote on my hung process
   has data == 0, so it definately lost one of the signals:
   (kgdb) print $14.kq_knhash[20].slh_first[0]
   $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
 kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500,
   tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent 
   = {
   ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 
   0x0},
 kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
   p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8,
   p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8,
   p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0,
 kn_hookid = 0}
   
   If you want to find this yourself, you can run kgdb on a live system,
   switch to the thread of the parent (info threads, thread XXX), and
   do:
   frame 7
   
   (or a frame that has td, which is struct thread *), then:
   print *(struct kqueue 
   *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data
   
   This will give you the struct kqueue * of the parent, and then:
   print $XX.kq_knhash[0]@63
   
   to figure out where the knote is in the hash, and then you can print
   it out yourself...
   
   I'm going to take a look at this a bit more later... I'm thinking of
   using dtrace to collect the stacks where filt_signal is called, and
   match them up...  dtrace might even be able to get us the note's data
   upon return helping to make sure things got tracked properly...
   
   Thanks for finding this bug!  Hopefully we can find a solution to it..
 
  diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
  index b3fb23d..7791447 100644
  --- a/sys/kern/kern_event.c
  +++ b/sys/kern/kern_event.c
  @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags)
  if ((kn-kn_status  KN_INFLUX) != KN_INFLUX) {
  

Re: kevent has bug?

2014-04-02 Thread Konstantin Belousov
On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote:
 Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300:
 Well, it's not that its preventing waking up the waiter, but failing to
 register the event on the knote because of the _INFLUX flag...
Yes, I used the wrong terminology.

 
  Patch below fixed your test case for me, also tools/regression/kqueue did
  not noticed a breakage.  I tried to describe the situation in the
  comment in knote().  Also, I removed unlocked check for the KN_INFLUX
  in knote, since it seems to be an optimization for rare case, and is
  the race on its own.
 
 Comments below...
 
  diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
  index b3fb23d..380f1ff 100644
  --- a/sys/kern/kern_event.c
  +++ b/sys/kern/kern_event.c
 
 [...]
 
  @@ -1506,7 +1506,7 @@ retry:
  KQ_LOCK(kq);
  kn = NULL;
  } else {
  -   kn-kn_status |= KN_INFLUX;
  +   kn-kn_status |= KN_INFLUX | KN_SCAN;
  KQ_UNLOCK(kq);
  if ((kn-kn_status  KN_KQUEUE) == KN_KQUEUE)
  KQ_GLOBAL_LOCK(kq_global, haskqglobal);
 
 Is there a reason you don't add the KN_SCAN to the other cases in
 kqueue_scan that set the _INFLUX flag?
Other cases in kqueue_scan() which set influx do the detach and drop,
so I do not see a need to ensure that note is registered.  Except I missed
one case, which you pointed out.

 
 [...]
 
  @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags)
   */
  SLIST_FOREACH(kn, list-kl_list, kn_selnext) {
  kq = kn-kn_kq;
  -   if ((kn-kn_status  KN_INFLUX) != KN_INFLUX) {
  +   KQ_LOCK(kq);
  +   if ((kn-kn_status  (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
  +   /*
  +* Do not process the influx notes, except for
  +* the influx coming from the kq unlock in the
  +* kqueue_scan().  In the later case, we do
  +* not interfere with the scan, since the code
  +* fragment in kqueue_scan() locks the knlist,
  +* and cannot proceed until we finished.
  +*/
 
 We might want to add a marker node, and reprocess the list from the
 marker node, because this might introduce other races in the code too...
 but the problem with that is that knote is expected to keep the list
 locked throughout the call if called w/ it already locked, and so we
 can't do that, without major work... :(
Why ? If the knlist lock is not dropped, I do not see a need for the
marker.  The patch does not introduce the sleep point for the KN_SCAN
knotes anyway.

 
 I added a similar comment in knote_fork:
  * XXX - Why do we skip the kn if it is _INFLUX?  Does this
  * mean we will not properly wake up some notes?
 
 and it looks like it was true...
 
 So, upon reading the other _INFLUX cases, it looks like we should change
 _SCAN to be, _CHANGING or something similar, and any place we don't end
 up dropping the knote, we set this flag also...  Once such case is at
 the end of kqueue_register, just before the label done_ev_add, where we
 update the knote w/ new udata and other fields..  Or change the logic
 of the flag, and set it for all the cases we are about to drop the
 knote..
So do you prefer KN_CHANGING instead of KN_SCAN ?  I do not have any
objections against renaming the flag, but _CHANGING seems to not say
anything about the flag intent.  I would say that KN_STABLE is more
useful, or KN_INFLUX_NODEL, or whatever.

The done_ev_add case is indeed missed in my patch, thank you for noting.
The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the
race is possible just by the nature of adding the knote.

 
  +   KQ_UNLOCK(kq);
  +   } else if ((lockflags  KNF_NOKQLOCK) != 0) {
  +   kn-kn_status |= KN_INFLUX;
  +   KQ_UNLOCK(kq);
  +   error = kn-kn_fop-f_event(kn, hint);
  KQ_LOCK(kq);
 
 I believe we can drop this unlock/lock pair as it's safe to hold the
 KQ lock over f_event, we do that in knote_fork...
The knote_fork() is for the special kinds of knote only, where we indeed know
in advance that having the kqueue locked around f_event does not break things.

Updated patch below.

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index b3fb23d..fadb8fd 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid)
continue;
kq = kn-kn_kq;
KQ_LOCK(kq);
-   if ((kn-kn_status  KN_INFLUX) == KN_INFLUX) {
+   if ((kn-kn_status  (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
KQ_UNLOCK(kq);
continue;
   

kevent has bug?

2014-04-01 Thread Kohji Okuno
Hi,

I think, kevent() has a bug.
I tested sample programs by attached sources.
This sample tests about EVFILT_SIGNAL.

I build sample programs by the following commands.
% gcc -O2 -o child child.c
% gcc -O2 -o parent parent.c

The expected result is the following.
% ./parent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
OK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
OK

But, sometimes the result was the following.
% ./parent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

This result means the number of times the signal has occured was
incorrect.



In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
number of times the signal has occurred since the last call to
kevent(). This `data' is recorded by filt_signal() (This is f_event in
struct filterops).

The system call kevent()'s events are processed by kqueue_scan() in
kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways
called after KN_INFLUX is set to kn-kn_status.

On the other hand, kernel events are occured by knote() in
kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when
KN_INFLUX is not set in kn-kn_status.

In race condition between kqueue_scan() and knote(),
kn-kn_fop-f_event() from knote() may not be called, I think.


In knote(), because the context holds knlist's lock, the context can
not sleep. So, KN_INFLUX should not be set on calling
kn-kn_fop-f_event() in kqueue_scan(), I think.

What do you think about this issue?

Best regards,
 Kohji Okuno
#include sys/types.h
#include stdlib.h
#include unistd.h
#include stdio.h

int
main()
{
sleep(1);
exit(0);
}
#include sys/types.h
#include sys/wait.h
#include sys/event.h
#include sys/time.h
#include stdlib.h
#include stdio.h
#include unistd.h
#include signal.h

#define NUM_CHILDREN20

int
main()
{
int i;
pid_t pid;
char *argv[2] = {child, NULL};
struct kevent kev;
int kqfd = kqueue();
int count;
int err;
int status;

EV_SET(kev, SIGCHLD, EVFILT_SIGNAL, EV_ADD, 0, 0, 0);
kevent(kqfd, kev, 1, NULL, 0, NULL);

while (1) {
count = 0;
for (i = 0; i  NUM_CHILDREN; i++) {
pid = fork();
if (pid == 0) {
execve(./child, argv, NULL);
}
}

while (1) {
err = kevent(kqfd, NULL, 0, kev, 1, NULL);
if (err  0  kev.ident == SIGCHLD) {
for (i = 0; i  kev.data; i++) {
pid = waitpid(-1, status, WNOHANG);
if (pid  0) {
count++;
printf(%d , count);
fflush(stdout);
if (count == NUM_CHILDREN) {
printf(\nOK\n);
goto next;
}
}
}
}
}
 next:
 ;
}
exit(0);
}
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org