Re: kevent has bug?
On Fri, Apr 04, 2014 at 07:55:05AM +0900, Kohji Okuno wrote: Hi, Thank you for your detailed commnet. And I uderstood about your opinion. By the way, do you commit your change to HEAD? Sure I will, I am waiting for possible last comment from John-Mark. Unless he provides more feedback, I plan to commit the patch tomorrow. pgpT4NbzyF4Kn.pgp Description: PGP signature
Re: kevent has bug?
From: Konstantin Belousov kostik...@gmail.com Date: Wed, 2 Apr 2014 20:44:00 +0300 On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote: Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300: Well, it's not that its preventing waking up the waiter, but failing to register the event on the knote because of the _INFLUX flag... Yes, I used the wrong terminology. Patch below fixed your test case for me, also tools/regression/kqueue did not noticed a breakage. I tried to describe the situation in the comment in knote(). Also, I removed unlocked check for the KN_INFLUX in knote, since it seems to be an optimization for rare case, and is the race on its own. Comments below... diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..380f1ff 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c [...] @@ -1506,7 +1506,7 @@ retry: KQ_LOCK(kq); kn = NULL; } else { - kn-kn_status |= KN_INFLUX; + kn-kn_status |= KN_INFLUX | KN_SCAN; KQ_UNLOCK(kq); if ((kn-kn_status KN_KQUEUE) == KN_KQUEUE) KQ_GLOBAL_LOCK(kq_global, haskqglobal); Is there a reason you don't add the KN_SCAN to the other cases in kqueue_scan that set the _INFLUX flag? Other cases in kqueue_scan() which set influx do the detach and drop, so I do not see a need to ensure that note is registered. Except I missed one case, which you pointed out. [...] @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags) */ SLIST_FOREACH(kn, list-kl_list, kn_selnext) { kq = kn-kn_kq; - if ((kn-kn_status KN_INFLUX) != KN_INFLUX) { + KQ_LOCK(kq); + if ((kn-kn_status (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { + /* + * Do not process the influx notes, except for + * the influx coming from the kq unlock in the + * kqueue_scan(). In the later case, we do + * not interfere with the scan, since the code + * fragment in kqueue_scan() locks the knlist, + * and cannot proceed until we finished. + */ We might want to add a marker node, and reprocess the list from the marker node, because this might introduce other races in the code too... but the problem with that is that knote is expected to keep the list locked throughout the call if called w/ it already locked, and so we can't do that, without major work... :( Why ? If the knlist lock is not dropped, I do not see a need for the marker. The patch does not introduce the sleep point for the KN_SCAN knotes anyway. I added a similar comment in knote_fork: * XXX - Why do we skip the kn if it is _INFLUX? Does this * mean we will not properly wake up some notes? and it looks like it was true... So, upon reading the other _INFLUX cases, it looks like we should change _SCAN to be, _CHANGING or something similar, and any place we don't end up dropping the knote, we set this flag also... Once such case is at the end of kqueue_register, just before the label done_ev_add, where we update the knote w/ new udata and other fields.. Or change the logic of the flag, and set it for all the cases we are about to drop the knote.. So do you prefer KN_CHANGING instead of KN_SCAN ? I do not have any objections against renaming the flag, but _CHANGING seems to not say anything about the flag intent. I would say that KN_STABLE is more useful, or KN_INFLUX_NODEL, or whatever. The done_ev_add case is indeed missed in my patch, thank you for noting. The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the race is possible just by the nature of adding the knote. + KQ_UNLOCK(kq); + } else if ((lockflags KNF_NOKQLOCK) != 0) { + kn-kn_status |= KN_INFLUX; + KQ_UNLOCK(kq); + error = kn-kn_fop-f_event(kn, hint); KQ_LOCK(kq); I believe we can drop this unlock/lock pair as it's safe to hold the KQ lock over f_event, we do that in knote_fork... The knote_fork() is for the special kinds of knote only, where we indeed know in advance that having the kqueue locked around f_event does not break things. Updated patch below. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..fadb8fd 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid) continue; kq = kn-kn_kq; KQ_LOCK(kq); - if ((kn-kn_status KN_INFLUX) == KN_INFLUX) { + if ((kn-kn_status (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
Re: kevent has bug?
On Thu, Apr 03, 2014 at 06:26:56PM +0900, Kohji Okuno wrote: The done_ev_add case is indeed missed in my patch, thank you for noting. The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the race is possible just by the nature of adding the knote. I think, we should add KN_SCAN after knote_attach() in kqueue_register(), too. What do you think about this? See above, I noted this case in the previous mail. This may be elaborated. First, I think it is technically incorrect to allow the event notification before the f_attach() method is finished. So the KN_SCAN flag could be set only after f_attach() call, but due to both kq and knlist not locked there, we still have the same race. And this race is in fact acceptable, since it is the race between application calling EV_ADD, and external event occuring, which cannot be avoided. Until the kevent(EV_ADD) syscall returned, we do not have an obligation to report the event from the kqfd. Having the race somewhat bigger by not setting KN_SCAN is fine in my opinion. pgp_eM3PUg0OL.pgp Description: PGP signature
Re: kevent has bug?
From: Konstantin Belousov kostik...@gmail.com Date: Thu, 3 Apr 2014 16:48:14 +0300 On Thu, Apr 03, 2014 at 06:26:56PM +0900, Kohji Okuno wrote: The done_ev_add case is indeed missed in my patch, thank you for noting. The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the race is possible just by the nature of adding the knote. I think, we should add KN_SCAN after knote_attach() in kqueue_register(), too. What do you think about this? See above, I noted this case in the previous mail. This may be elaborated. First, I think it is technically incorrect to allow the event notification before the f_attach() method is finished. So the KN_SCAN flag could be set only after f_attach() call, but due to both kq and knlist not locked there, we still have the same race. And this race is in fact acceptable, since it is the race between application calling EV_ADD, and external event occuring, which cannot be avoided. Until the kevent(EV_ADD) syscall returned, we do not have an obligation to report the event from the kqfd. Having the race somewhat bigger by not setting KN_SCAN is fine in my opinion. Hi, Thank you for your detailed commnet. And I uderstood about your opinion. By the way, do you commit your change to HEAD? Regards, Kohji Okuno ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: kevent has bug?
Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: I think, kevent() has a bug. I tested sample programs by attached sources. This sample tests about EVFILT_SIGNAL. I build sample programs by the following commands. % gcc -O2 -o child child.c % gcc -O2 -o parent parent.c The expected result is the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK But, sometimes the result was the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 This result means the number of times the signal has occured was incorrect. I was able to reproduce this... In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the number of times the signal has occurred since the last call to kevent(). This `data' is recorded by filt_signal() (This is f_event in struct filterops). The system call kevent()'s events are processed by kqueue_scan() in kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways called after KN_INFLUX is set to kn-kn_status. On the other hand, kernel events are occured by knote() in kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when KN_INFLUX is not set in kn-kn_status. In race condition between kqueue_scan() and knote(), kn-kn_fop-f_event() from knote() may not be called, I think. Considering that both are called w/ a lock, that cannot happen.. KN_LIST_LOCK(kn) locks the same lock that is asserted that is held by knote... In knote(), because the context holds knlist's lock, the context can not sleep. So, KN_INFLUX should not be set on calling kn-kn_fop-f_event() in kqueue_scan(), I think. No, it needs to be set: * Setting the KN_INFLUX flag enables you to unlock the kq that this knote * is on, and modify kn_status as if you had the KQ lock. As this comment says, _INFLUX allows you to unlock the KQ w/o fear that the knote will disappear out from under you causing you to dereference possibly free'd memory.. If you just tried to lock the list lock w/o unlocking the KQ lock, you could end up w/ a dead lock, as you aren't maintaining lock order properly.. The correct lock order if knlist - kq... What do you think about this issue? This is a real issue, but not due to the race you described above... I have verified on my machine that it isn't because there is a knote waiting that isn't getting woken up, and the knote on my hung process has data == 0, so it definately lost one of the signals: (kgdb) print $14.kq_knhash[20].slh_first[0] $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500, tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = { ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0}, kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = { p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8, p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8, p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0, kn_hookid = 0} If you want to find this yourself, you can run kgdb on a live system, switch to the thread of the parent (info threads, thread XXX), and do: frame 7 (or a frame that has td, which is struct thread *), then: print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data This will give you the struct kqueue * of the parent, and then: print $XX.kq_knhash[0]@63 to figure out where the knote is in the hash, and then you can print it out yourself... I'm going to take a look at this a bit more later... I'm thinking of using dtrace to collect the stacks where filt_signal is called, and match them up... dtrace might even be able to get us the note's data upon return helping to make sure things got tracked properly... Thanks for finding this bug! Hopefully we can find a solution to it.. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: kevent has bug?
From: John-Mark Gurney j...@funkthat.com Date: Tue, 1 Apr 2014 23:15:51 -0700 Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: I think, kevent() has a bug. I tested sample programs by attached sources. This sample tests about EVFILT_SIGNAL. I build sample programs by the following commands. % gcc -O2 -o child child.c % gcc -O2 -o parent parent.c The expected result is the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK But, sometimes the result was the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 This result means the number of times the signal has occured was incorrect. I was able to reproduce this... In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the number of times the signal has occurred since the last call to kevent(). This `data' is recorded by filt_signal() (This is f_event in struct filterops). The system call kevent()'s events are processed by kqueue_scan() in kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways called after KN_INFLUX is set to kn-kn_status. On the other hand, kernel events are occured by knote() in kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when KN_INFLUX is not set in kn-kn_status. In race condition between kqueue_scan() and knote(), kn-kn_fop-f_event() from knote() may not be called, I think. Considering that both are called w/ a lock, that cannot happen.. KN_LIST_LOCK(kn) locks the same lock that is asserted that is held by knote... In knote(), because the context holds knlist's lock, the context can not sleep. So, KN_INFLUX should not be set on calling kn-kn_fop-f_event() in kqueue_scan(), I think. No, it needs to be set: * Setting the KN_INFLUX flag enables you to unlock the kq that this knote * is on, and modify kn_status as if you had the KQ lock. As this comment says, _INFLUX allows you to unlock the KQ w/o fear that the knote will disappear out from under you causing you to dereference possibly free'd memory.. If you just tried to lock the list lock w/o unlocking the KQ lock, you could end up w/ a dead lock, as you aren't maintaining lock order properly.. The correct lock order if knlist - kq... What do you think about this issue? This is a real issue, but not due to the race you described above... I beleave it's the result of the race. Could you try to add printf() in knote()? Please refer to attached patch. I have verified on my machine that it isn't because there is a knote waiting that isn't getting woken up, and the knote on my hung process has data == 0, so it definately lost one of the signals: (kgdb) print $14.kq_knhash[20].slh_first[0] $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500, tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = { ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0}, kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = { p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8, p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8, p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0, kn_hookid = 0} If you want to find this yourself, you can run kgdb on a live system, switch to the thread of the parent (info threads, thread XXX), and do: frame 7 (or a frame that has td, which is struct thread *), then: print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data This will give you the struct kqueue * of the parent, and then: print $XX.kq_knhash[0]@63 to figure out where the knote is in the hash, and then you can print it out yourself... I'm going to take a look at this a bit more later... I'm thinking of using dtrace to collect the stacks where filt_signal is called, and match them up... dtrace might even be able to get us the note's data upon return helping to make sure things got tracked properly... Thanks for finding this bug! Hopefully we can find a solution to it.. -- John-Mark GurneyVoice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..7791447 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags) if ((kn-kn_status KN_INFLUX) != KN_INFLUX) { KQ_LOCK(kq); if ((kn-kn_status KN_INFLUX) == KN_INFLUX) { +if (hint NOTE_SIGNAL) + printf(Aee2\n); KQ_UNLOCK(kq); } else if ((lockflags KNF_NOKQLOCK) != 0) { kn-kn_status |= KN_INFLUX; @@ -1886,6 +1888,10 @@ knote(struct knlist *list, long hint, int lockflags)
Re: kevent has bug?
On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote: From: John-Mark Gurney j...@funkthat.com Date: Tue, 1 Apr 2014 23:15:51 -0700 Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: I think, kevent() has a bug. I tested sample programs by attached sources. This sample tests about EVFILT_SIGNAL. I build sample programs by the following commands. % gcc -O2 -o child child.c % gcc -O2 -o parent parent.c The expected result is the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK But, sometimes the result was the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 This result means the number of times the signal has occured was incorrect. I was able to reproduce this... In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the number of times the signal has occurred since the last call to kevent(). This `data' is recorded by filt_signal() (This is f_event in struct filterops). The system call kevent()'s events are processed by kqueue_scan() in kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways called after KN_INFLUX is set to kn-kn_status. On the other hand, kernel events are occured by knote() in kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when KN_INFLUX is not set in kn-kn_status. In race condition between kqueue_scan() and knote(), kn-kn_fop-f_event() from knote() may not be called, I think. Considering that both are called w/ a lock, that cannot happen.. KN_LIST_LOCK(kn) locks the same lock that is asserted that is held by knote... In knote(), because the context holds knlist's lock, the context can not sleep. So, KN_INFLUX should not be set on calling kn-kn_fop-f_event() in kqueue_scan(), I think. No, it needs to be set: * Setting the KN_INFLUX flag enables you to unlock the kq that this knote * is on, and modify kn_status as if you had the KQ lock. As this comment says, _INFLUX allows you to unlock the KQ w/o fear that the knote will disappear out from under you causing you to dereference possibly free'd memory.. If you just tried to lock the list lock w/o unlocking the KQ lock, you could end up w/ a dead lock, as you aren't maintaining lock order properly.. The correct lock order if knlist - kq... What do you think about this issue? This is a real issue, but not due to the race you described above... I beleave it's the result of the race. Could you try to add printf() in knote()? Please refer to attached patch. I have verified on my machine that it isn't because there is a knote waiting that isn't getting woken up, and the knote on my hung process has data == 0, so it definately lost one of the signals: (kgdb) print $14.kq_knhash[20].slh_first[0] $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500, tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = { ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0}, kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = { p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8, p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8, p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0, kn_hookid = 0} If you want to find this yourself, you can run kgdb on a live system, switch to the thread of the parent (info threads, thread XXX), and do: frame 7 (or a frame that has td, which is struct thread *), then: print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data This will give you the struct kqueue * of the parent, and then: print $XX.kq_knhash[0]@63 to figure out where the knote is in the hash, and then you can print it out yourself... I'm going to take a look at this a bit more later... I'm thinking of using dtrace to collect the stacks where filt_signal is called, and match them up... dtrace might even be able to get us the note's data upon return helping to make sure things got tracked properly... Thanks for finding this bug! Hopefully we can find a solution to it.. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..7791447 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags) if ((kn-kn_status KN_INFLUX) != KN_INFLUX) { KQ_LOCK(kq); if ((kn-kn_status KN_INFLUX) == KN_INFLUX) { +
Re: kevent has bug?
From: Konstantin Belousov kostik...@gmail.com Date: Wed, 2 Apr 2014 15:07:45 +0300 On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote: From: John-Mark Gurney j...@funkthat.com Date: Tue, 1 Apr 2014 23:15:51 -0700 Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: I think, kevent() has a bug. I tested sample programs by attached sources. This sample tests about EVFILT_SIGNAL. I build sample programs by the following commands. % gcc -O2 -o child child.c % gcc -O2 -o parent parent.c The expected result is the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK But, sometimes the result was the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 This result means the number of times the signal has occured was incorrect. I was able to reproduce this... In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the number of times the signal has occurred since the last call to kevent(). This `data' is recorded by filt_signal() (This is f_event in struct filterops). The system call kevent()'s events are processed by kqueue_scan() in kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways called after KN_INFLUX is set to kn-kn_status. On the other hand, kernel events are occured by knote() in kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when KN_INFLUX is not set in kn-kn_status. In race condition between kqueue_scan() and knote(), kn-kn_fop-f_event() from knote() may not be called, I think. Considering that both are called w/ a lock, that cannot happen.. KN_LIST_LOCK(kn) locks the same lock that is asserted that is held by knote... In knote(), because the context holds knlist's lock, the context can not sleep. So, KN_INFLUX should not be set on calling kn-kn_fop-f_event() in kqueue_scan(), I think. No, it needs to be set: * Setting the KN_INFLUX flag enables you to unlock the kq that this knote * is on, and modify kn_status as if you had the KQ lock. As this comment says, _INFLUX allows you to unlock the KQ w/o fear that the knote will disappear out from under you causing you to dereference possibly free'd memory.. If you just tried to lock the list lock w/o unlocking the KQ lock, you could end up w/ a dead lock, as you aren't maintaining lock order properly.. The correct lock order if knlist - kq... What do you think about this issue? This is a real issue, but not due to the race you described above... I beleave it's the result of the race. Could you try to add printf() in knote()? Please refer to attached patch. I have verified on my machine that it isn't because there is a knote waiting that isn't getting woken up, and the knote on my hung process has data == 0, so it definately lost one of the signals: (kgdb) print $14.kq_knhash[20].slh_first[0] $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500, tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = { ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0}, kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = { p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8, p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8, p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0, kn_hookid = 0} If you want to find this yourself, you can run kgdb on a live system, switch to the thread of the parent (info threads, thread XXX), and do: frame 7 (or a frame that has td, which is struct thread *), then: print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data This will give you the struct kqueue * of the parent, and then: print $XX.kq_knhash[0]@63 to figure out where the knote is in the hash, and then you can print it out yourself... I'm going to take a look at this a bit more later... I'm thinking of using dtrace to collect the stacks where filt_signal is called, and match them up... dtrace might even be able to get us the note's data upon return helping to make sure things got tracked properly... Thanks for finding this bug! Hopefully we can find a solution to it.. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..7791447 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags) if ((kn-kn_status KN_INFLUX) != KN_INFLUX) {
Re: kevent has bug?
Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300: On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote: From: John-Mark Gurney j...@funkthat.com Date: Tue, 1 Apr 2014 23:15:51 -0700 Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: I think, kevent() has a bug. I tested sample programs by attached sources. This sample tests about EVFILT_SIGNAL. I build sample programs by the following commands. % gcc -O2 -o child child.c % gcc -O2 -o parent parent.c The expected result is the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK But, sometimes the result was the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 This result means the number of times the signal has occured was incorrect. I was able to reproduce this... In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the number of times the signal has occurred since the last call to kevent(). This `data' is recorded by filt_signal() (This is f_event in struct filterops). The system call kevent()'s events are processed by kqueue_scan() in kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways called after KN_INFLUX is set to kn-kn_status. On the other hand, kernel events are occured by knote() in kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when KN_INFLUX is not set in kn-kn_status. In race condition between kqueue_scan() and knote(), kn-kn_fop-f_event() from knote() may not be called, I think. Looks like I misunderstood your point, sorry... Considering that both are called w/ a lock, that cannot happen.. KN_LIST_LOCK(kn) locks the same lock that is asserted that is held by knote... In knote(), because the context holds knlist's lock, the context can not sleep. So, KN_INFLUX should not be set on calling kn-kn_fop-f_event() in kqueue_scan(), I think. No, it needs to be set: * Setting the KN_INFLUX flag enables you to unlock the kq that this knote * is on, and modify kn_status as if you had the KQ lock. As this comment says, _INFLUX allows you to unlock the KQ w/o fear that the knote will disappear out from under you causing you to dereference possibly free'd memory.. If you just tried to lock the list lock w/o unlocking the KQ lock, you could end up w/ a dead lock, as you aren't maintaining lock order properly.. The correct lock order if knlist - kq... What do you think about this issue? This is a real issue, but not due to the race you described above... I beleave it's the result of the race. Could you try to add printf() in knote()? Please refer to attached patch. I have verified on my machine that it isn't because there is a knote waiting that isn't getting woken up, and the knote on my hung process has data == 0, so it definately lost one of the signals: (kgdb) print $14.kq_knhash[20].slh_first[0] $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0xf8005a9c5840, kn_tqe = {tqe_next = 0xf801fdab4500, tqe_prev = 0xf8004bb10038}, kn_kq = 0xf8004bb1, kn_kevent = { ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0}, kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = { p_fp = 0xf8005a9c54b8, p_proc = 0xf8005a9c54b8, p_aio = 0xf8005a9c54b8, p_lio = 0xf8005a9c54b8, p_v = 0xf8005a9c54b8}, kn_fop = 0x81405ef0, kn_hook = 0x0, kn_hookid = 0} If you want to find this yourself, you can run kgdb on a live system, switch to the thread of the parent (info threads, thread XXX), and do: frame 7 (or a frame that has td, which is struct thread *), then: print *(struct kqueue *)td-td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data This will give you the struct kqueue * of the parent, and then: print $XX.kq_knhash[0]@63 to figure out where the knote is in the hash, and then you can print it out yourself... I'm going to take a look at this a bit more later... I'm thinking of using dtrace to collect the stacks where filt_signal is called, and match them up... dtrace might even be able to get us the note's data upon return helping to make sure things got tracked properly... Thanks for finding this bug! Hopefully we can find a solution to it.. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..7791447 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags) if ((kn-kn_status KN_INFLUX) != KN_INFLUX) {
Re: kevent has bug?
On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote: Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300: Well, it's not that its preventing waking up the waiter, but failing to register the event on the knote because of the _INFLUX flag... Yes, I used the wrong terminology. Patch below fixed your test case for me, also tools/regression/kqueue did not noticed a breakage. I tried to describe the situation in the comment in knote(). Also, I removed unlocked check for the KN_INFLUX in knote, since it seems to be an optimization for rare case, and is the race on its own. Comments below... diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..380f1ff 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c [...] @@ -1506,7 +1506,7 @@ retry: KQ_LOCK(kq); kn = NULL; } else { - kn-kn_status |= KN_INFLUX; + kn-kn_status |= KN_INFLUX | KN_SCAN; KQ_UNLOCK(kq); if ((kn-kn_status KN_KQUEUE) == KN_KQUEUE) KQ_GLOBAL_LOCK(kq_global, haskqglobal); Is there a reason you don't add the KN_SCAN to the other cases in kqueue_scan that set the _INFLUX flag? Other cases in kqueue_scan() which set influx do the detach and drop, so I do not see a need to ensure that note is registered. Except I missed one case, which you pointed out. [...] @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags) */ SLIST_FOREACH(kn, list-kl_list, kn_selnext) { kq = kn-kn_kq; - if ((kn-kn_status KN_INFLUX) != KN_INFLUX) { + KQ_LOCK(kq); + if ((kn-kn_status (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { + /* +* Do not process the influx notes, except for +* the influx coming from the kq unlock in the +* kqueue_scan(). In the later case, we do +* not interfere with the scan, since the code +* fragment in kqueue_scan() locks the knlist, +* and cannot proceed until we finished. +*/ We might want to add a marker node, and reprocess the list from the marker node, because this might introduce other races in the code too... but the problem with that is that knote is expected to keep the list locked throughout the call if called w/ it already locked, and so we can't do that, without major work... :( Why ? If the knlist lock is not dropped, I do not see a need for the marker. The patch does not introduce the sleep point for the KN_SCAN knotes anyway. I added a similar comment in knote_fork: * XXX - Why do we skip the kn if it is _INFLUX? Does this * mean we will not properly wake up some notes? and it looks like it was true... So, upon reading the other _INFLUX cases, it looks like we should change _SCAN to be, _CHANGING or something similar, and any place we don't end up dropping the knote, we set this flag also... Once such case is at the end of kqueue_register, just before the label done_ev_add, where we update the knote w/ new udata and other fields.. Or change the logic of the flag, and set it for all the cases we are about to drop the knote.. So do you prefer KN_CHANGING instead of KN_SCAN ? I do not have any objections against renaming the flag, but _CHANGING seems to not say anything about the flag intent. I would say that KN_STABLE is more useful, or KN_INFLUX_NODEL, or whatever. The done_ev_add case is indeed missed in my patch, thank you for noting. The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the race is possible just by the nature of adding the knote. + KQ_UNLOCK(kq); + } else if ((lockflags KNF_NOKQLOCK) != 0) { + kn-kn_status |= KN_INFLUX; + KQ_UNLOCK(kq); + error = kn-kn_fop-f_event(kn, hint); KQ_LOCK(kq); I believe we can drop this unlock/lock pair as it's safe to hold the KQ lock over f_event, we do that in knote_fork... The knote_fork() is for the special kinds of knote only, where we indeed know in advance that having the kqueue locked around f_event does not break things. Updated patch below. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..fadb8fd 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid) continue; kq = kn-kn_kq; KQ_LOCK(kq); - if ((kn-kn_status KN_INFLUX) == KN_INFLUX) { + if ((kn-kn_status (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { KQ_UNLOCK(kq); continue;
kevent has bug?
Hi, I think, kevent() has a bug. I tested sample programs by attached sources. This sample tests about EVFILT_SIGNAL. I build sample programs by the following commands. % gcc -O2 -o child child.c % gcc -O2 -o parent parent.c The expected result is the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 OK But, sometimes the result was the following. % ./parent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 This result means the number of times the signal has occured was incorrect. In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the number of times the signal has occurred since the last call to kevent(). This `data' is recorded by filt_signal() (This is f_event in struct filterops). The system call kevent()'s events are processed by kqueue_scan() in kern_event.c. In kqueue_scan(), kn-kn_fop-f_event() is allways called after KN_INFLUX is set to kn-kn_status. On the other hand, kernel events are occured by knote() in kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() in kern_sig.c.) In knote(), kn-kn_fop-f_event() is called only when KN_INFLUX is not set in kn-kn_status. In race condition between kqueue_scan() and knote(), kn-kn_fop-f_event() from knote() may not be called, I think. In knote(), because the context holds knlist's lock, the context can not sleep. So, KN_INFLUX should not be set on calling kn-kn_fop-f_event() in kqueue_scan(), I think. What do you think about this issue? Best regards, Kohji Okuno #include sys/types.h #include stdlib.h #include unistd.h #include stdio.h int main() { sleep(1); exit(0); } #include sys/types.h #include sys/wait.h #include sys/event.h #include sys/time.h #include stdlib.h #include stdio.h #include unistd.h #include signal.h #define NUM_CHILDREN20 int main() { int i; pid_t pid; char *argv[2] = {child, NULL}; struct kevent kev; int kqfd = kqueue(); int count; int err; int status; EV_SET(kev, SIGCHLD, EVFILT_SIGNAL, EV_ADD, 0, 0, 0); kevent(kqfd, kev, 1, NULL, 0, NULL); while (1) { count = 0; for (i = 0; i NUM_CHILDREN; i++) { pid = fork(); if (pid == 0) { execve(./child, argv, NULL); } } while (1) { err = kevent(kqfd, NULL, 0, kev, 1, NULL); if (err 0 kev.ident == SIGCHLD) { for (i = 0; i kev.data; i++) { pid = waitpid(-1, status, WNOHANG); if (pid 0) { count++; printf(%d , count); fflush(stdout); if (count == NUM_CHILDREN) { printf(\nOK\n); goto next; } } } } } next: ; } exit(0); } ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org