Re: [PATCH] Re: Junior Kernel Hacker page updated...
Stefan Farfeleder wrote: > Imagine this scenario where CPU 0 inserts a knote kn1 (the marker) in > knote_scan and CPU 1 kn2 in kqueue_enqueue: > > CPU 0 | CPU 1 > +--- > kn1->kn_tqe.tqe_next = NULL;| > | > +--- > kn1->kn_tqe.tqe_prev = | kn2->kn_tqe.tqe_next = NULL; > kq_head.tqh_last; | > +--- > *kq_head.tqh_last = kn1;| kn2->kn_tqe.tqe_prev = > | kq_head.tqh_last; > +--- > kq_head.tqh_last = | *kq_head.tqh_last = kn2; > &kn1->kn_tqe.tqe_next; | > +--- > | kq_head.tqh_last = > | &kn2->kn_tqe.tqe_next; > > The marker would never appear on the queue. The macro is broken. Here is a fix. #define TAILQ_INSERT_TAIL(head, elm, field) do {\ void **savepp; \ (elm)->field.tqe_next = NULL; \ (elm)->field.tqe_prev = (head)->tqh_last; \ savepp = (head)->tqh_last; \ (head)->tqh_last = &(elm)->field.tqe_next; \ (elm)->field.tqe_prev = savepp; \ *savepp = (elm);\ } while (0) I'm not sure that it's possible to totally close the race; I think the way you would have to do it is to traverse the tqh_last pointer until the tqh_last->file.tqe_next == NULL, before reference it (ugly). The failure mode with the race is an inverted insertion order, which I think, in this case, is not a problem. The double setting of the prev guarantees it has a "harmless" value for a traversal during insertion; it acts as if the insertion had not occurred, if there are two that occurred simultaneously, until both sets of data are valid. This is doable in Intel assembly, I think (CMPXCHGL). Ugh. On other architectures, there's no way to avoid a mutex (e.g. MIPS). I hate tailq's. 8-(. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On Wed, Oct 09, 2002 at 04:07:45PM -0700, Terry Lambert wrote: > Stefan Farfeleder wrote: > > Is it just a warning or does it pose a real problem? > > > > I think the problem with the current code is that knote_{en,de}queue can > > be executed in parallel (on another CPU, spl*() can't prevent that, can > > it?) with kqueue_scan and that kq->kq_head thus can be corrupted. > > Or am I totally wrong? > > My patch would have worked, in that case, since it would ensure > one marker entry with a unique stack address per simultaneous > scanner. > > It has to be that the queue itself is being deleted out from > under it: the problem is not the scan, nor the insert or the > delete. > > Most likely, this is for an object whose queue is not tracked > by process, or for a process queue that's being examined by > another process (e.g. kevent's on fork/exit/etc.). > > You can verify this for your own satisfaction by looking at the > pointer manipulation order for the insertion and deletion; the > insertion sets the next pointer before setting the pointer to > the inserted object, and the deletion sets the pointer that > used to point to the deleted object to the delete object's next, > before deleting the object. Thus, traversals in progress should > not result in an error. Imagine this scenario where CPU 0 inserts a knote kn1 (the marker) in knote_scan and CPU 1 kn2 in kqueue_enqueue: CPU 0 | CPU 1 +--- kn1->kn_tqe.tqe_next = NULL;| | +--- kn1->kn_tqe.tqe_prev = | kn2->kn_tqe.tqe_next = NULL; kq_head.tqh_last; | +--- *kq_head.tqh_last = kn1;| kn2->kn_tqe.tqe_prev = | kq_head.tqh_last; +--- kq_head.tqh_last = | *kq_head.tqh_last = kn2; &kn1->kn_tqe.tqe_next; | +--- | kq_head.tqh_last = | &kn2->kn_tqe.tqe_next; The marker would never appear on the queue. Regards, Stefan Farfeleder To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On 10 Oct, Stefan Farfeleder wrote: > On Tue, Oct 08, 2002 at 09:26:29PM -0700, Don Lewis wrote: >> On 8 Oct, Stefan Farfeleder wrote: >> > However, WITNESS complains (only once) about this: >> > lock order reversal >> > 1st 0xc662140c kqueue mutex (kqueue mutex) @ >/freebsd/current/src/sys/kern/kern_event.c:714 >> > 2nd 0xc6727d00 pipe mutex (pipe mutex) @ >/freebsd/current/src/sys/kern/sys_pipe.c:1478 >> >> That's pretty similar to the lock order reversal I've seen in the pipe >> code and it's interaction with sigio, which is not suprising since >> pipeselwakeup() calls both pgsigio() and KNOTE(), often while the pipe >> lock is held. Correctly fixing this doesn't look easy ... > > Is it just a warning or does it pose a real problem? It's a warning of a potential problem. If someone else was trying to grab the locks in the opposite order at the same time, you'd get a deadlock. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
Stefan Farfeleder wrote: > Is it just a warning or does it pose a real problem? > > I think the problem with the current code is that knote_{en,de}queue can > be executed in parallel (on another CPU, spl*() can't prevent that, can > it?) with kqueue_scan and that kq->kq_head thus can be corrupted. > Or am I totally wrong? My patch would have worked, in that case, since it would ensure one marker entry with a unique stack address per simultaneous scanner. It has to be that the queue itself is being deleted out from under it: the problem is not the scan, nor the insert or the delete. Most likely, this is for an object whose queue is not tracked by process, or for a process queue that's being examined by another process (e.g. kevent's on fork/exit/etc.). You can verify this for your own satisfaction by looking at the pointer manipulation order for the insertion and deletion; the insertion sets the next pointer before setting the pointer to the inserted object, and the deletion sets the pointer that used to point to the deleted object to the delete object's next, before deleting the object. Thus, traversals in progress should not result in an error. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On Tue, Oct 08, 2002 at 09:26:29PM -0700, Don Lewis wrote: > On 8 Oct, Stefan Farfeleder wrote: > > On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote: > > > Following the advice from the spl* man page I turned the spl* calls to a > > mutex and was surprised to see it working. My SMP -current survived a 'make > > -j16 buildworld' with make using kqueue() (which it did not a single > > time out of >30 times before). Further testings will follow tomorrow. Building 6 worlds in a row with -j ranging from 4 to 128 didn't crash it. > > However, WITNESS complains (only once) about this: > > lock order reversal > > 1st 0xc662140c kqueue mutex (kqueue mutex) @ >/freebsd/current/src/sys/kern/kern_event.c:714 > > 2nd 0xc6727d00 pipe mutex (pipe mutex) @ >/freebsd/current/src/sys/kern/sys_pipe.c:1478 > > That's pretty similar to the lock order reversal I've seen in the pipe > code and it's interaction with sigio, which is not suprising since > pipeselwakeup() calls both pgsigio() and KNOTE(), often while the pipe > lock is held. Correctly fixing this doesn't look easy ... Is it just a warning or does it pose a real problem? I think the problem with the current code is that knote_{en,de}queue can be executed in parallel (on another CPU, spl*() can't prevent that, can it?) with kqueue_scan and that kq->kq_head thus can be corrupted. Or am I totally wrong? @Poul: Since you are the only person who reported a kernel crash too, does the version with the mutex work for you? Regards, Stefan Farfelder To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On 8 Oct, Stefan Farfeleder wrote: > On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote: > Following the advice from the spl* man page I turned the spl* calls to a > mutex and was surprised to see it working. My SMP -current survived a 'make > -j16 buildworld' with make using kqueue() (which it did not a single > time out of >30 times before). Further testings will follow tomorrow. > > However, WITNESS complains (only once) about this: > lock order reversal > 1st 0xc662140c kqueue mutex (kqueue mutex) @ >/freebsd/current/src/sys/kern/kern_event.c:714 > 2nd 0xc6727d00 pipe mutex (pipe mutex) @ >/freebsd/current/src/sys/kern/sys_pipe.c:1478 That's pretty similar to the lock order reversal I've seen in the pipe code and it's interaction with sigio, which is not suprising since pipeselwakeup() calls both pgsigio() and KNOTE(), often while the pipe lock is held. Correctly fixing this doesn't look easy ... To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote: > > *** OK, it's very hard to believe you didn't break into the > *** debugger and manually call "pnaic" to get this to happen. You're right, this is exactly what I did. > I can't personally repeat the problem, so you're elected to do > the legwork on this one. 8-(. Following the advice from the spl* man page I turned the spl* calls to a mutex and was surprised to see it working. My SMP -current survived a 'make -j16 buildworld' with make using kqueue() (which it did not a single time out of >30 times before). Further testings will follow tomorrow. However, WITNESS complains (only once) about this: lock order reversal 1st 0xc662140c kqueue mutex (kqueue mutex) @ /freebsd/current/src/sys/kern/kern_event.c:714 2nd 0xc6727d00 pipe mutex (pipe mutex) @ /freebsd/current/src/sys/kern/sys_pipe.c:1478 Regards, Stefan Farfeleder Index: sys/eventvar.h === RCS file: /home/ncvs/src/sys/sys/eventvar.h,v retrieving revision 1.4 diff -u -r1.4 eventvar.h --- sys/eventvar.h 18 Jul 2000 19:31:48 - 1.4 +++ sys/eventvar.h 8 Oct 2002 16:06:46 - @@ -35,6 +35,7 @@ struct kqueue { TAILQ_HEAD(kqlist, knote) kq_head; /* list of pending event */ int kq_count; /* number of pending events */ + struct mtx kq_mtx; /* protect kq_head */ struct selinfo kq_sel; struct filedesc *kq_fdp; int kq_state; Index: kern/kern_event.c === RCS file: /home/ncvs/src/sys/kern/kern_event.c,v retrieving revision 1.46 diff -u -r1.46 kern_event.c --- kern/kern_event.c 3 Oct 2002 06:03:26 - 1.46 +++ kern/kern_event.c 8 Oct 2002 19:22:27 - @@ -375,7 +375,7 @@ if (error) goto done2; kq = malloc(sizeof(struct kqueue), M_KQUEUE, M_WAITOK | M_ZERO); - TAILQ_INIT(&kq->kq_head); + mtx_init(&kq->kq_mtx, "kqueue mutex", NULL, MTX_DEF); FILE_LOCK(fp); fp->f_flag = FREAD | FWRITE; fp->f_type = DTYPE_KQUEUE; @@ -709,13 +709,15 @@ error = 0; goto done; } + splx(s); + mtx_lock(&kq->kq_mtx); TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); while (count) { kn = TAILQ_FIRST(&kq->kq_head); TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); if (kn == &marker) { - splx(s); + mtx_unlock(&kq->kq_mtx); if (count == maxevents) goto retry; goto done; @@ -737,10 +739,10 @@ if (kn->kn_flags & EV_ONESHOT) { kn->kn_status &= ~KN_QUEUED; kq->kq_count--; - splx(s); + mtx_unlock(&kq->kq_mtx); kn->kn_fop->f_detach(kn); knote_drop(kn, td); - s = splhigh(); + mtx_lock(&kq->kq_mtx); } else if (kn->kn_flags & EV_CLEAR) { kn->kn_data = 0; kn->kn_fflags = 0; @@ -751,19 +753,19 @@ } count--; if (nkev == KQ_NEVENTS) { - splx(s); + mtx_unlock(&kq->kq_mtx); error = copyout(&kq->kq_kev, ulistp, sizeof(struct kevent) * nkev); ulistp += nkev; nkev = 0; kevp = kq->kq_kev; - s = splhigh(); + mtx_lock(&kq->kq_mtx); if (error) break; } } TAILQ_REMOVE(&kq->kq_head, &marker, kn_tqe); - splx(s); + mtx_unlock(&kq->kq_mtx); done: if (nkev != 0) error = copyout(&kq->kq_kev, ulistp, @@ -887,6 +889,7 @@ } } FILEDESC_UNLOCK(fdp); + mtx_destroy(&kq->kq_mtx); free(kq, M_KQUEUE); fp->f_data = NULL; @@ -1051,14 +1054,14 @@ knote_enqueue(struct knote *kn) { struct kqueue *kq = kn->kn_kq; - int s = splhigh(); KASSERT((kn->kn_status & KN_QUEUED) == 0, ("knote already queued")); + mtx_lock(&kq->kq_mtx); TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); kn->kn_status |= KN_QUEUED; kq->kq_count++; - splx(s); + mtx_unlock(&kq->kq_mtx); kqueue_wakeup(kq); } @@ -1066,14 +1069,14 @@ knote_dequeue(struct knote *kn) { struct kqueue *kq = kn->kn_kq; - int s = splhigh(); KASSERT(kn->kn_status & KN_QUEUED, ("knote not queued")); + mtx_lock(&kq->kq_mtx); T
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On Mon, 7 Oct 2002, Terry Lambert wrote: > Stefan Farfeleder wrote: > > > > I'm confused why marker - if it was removed by TAILQ_REMOVE - hasn't > > kn_tqe.tqe_next and kn_tqe.tqe_prev set to (void *)-1. because that only happens if the debug code in queue.h is enabled, which it is not.. > > OK, what this means is that the marker queue entry was removed > by something else going in there. > > THis shouldn't happen. > > Try adding this before the initialization of the marker data: > > bzero(&marker, sizeof(marker)); > > That should keep it from matching any removal criteria. THe only > way this could keep crashing after this mod is if the queue is > being destroyed out from under you. > > The implication here is that the queue should be protected by the > object lock for the object for which the pointer to the queue > instance is an element. > > Fixing this would be very hard (IMO). > > The next step (assuming it still panics) is to add: > > #define KQ_FREE 0x80 > > ...and set it into kq_state on a kqueue that has been freed and/or > deallocated somewhere (then check to see if it's set, after the > panic). Ugly, but it will tell you whether or not that's what's > happening (scanning a dead queue). > > The worst case is scanning a dead queue quose memory has been > reused for some other purpose. 8-(. > > I can't personally repeat the problem, so you're elected to do > the legwork on this one. 8-(. > > -- Terry > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-current" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] Re: Junior Kernel Hacker page updated...
Stefan Farfeleder wrote: > On Sun, Oct 06, 2002 at 11:14:26PM -0700, Terry Lambert wrote: > > Stefan: Did the patch fix it, or not? > > Sorry for the long delay. No, it did not. But I now have a rather > interesting core dump. I inserted a KASSERT, so that the code looks like > this: > > TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); > while (count) { > kn = TAILQ_FIRST(&kq->kq_head); > KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL")); [ ... ] > panic: bremfree: bp 0xd2adf6f0 not locked Second panic, during debugger sync. > panic: TAILQ_FIRST returned NULL See below... > panic: from debugger You, manually calling "panic" inside the debugger... > syncing disks... panic: bremfree: bp 0xd2adf6f0 not locked The second panic (again). > #2 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 2nd panic. > #10 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 Manual panic (no arguments). > #18 0xc01babcf in panic (fmt=0x0) > > at /freebsd/current/src/sys/kern/kern_shutdown.c:494 > > #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90, > > tsp=0xc754f828, td=0xc6426b60) > > at /freebsd/current/src/sys/kern/kern_event.c:717 *** OK, it's very hard to believe you didn't break into the *** debugger and manually call "pnaic" to get this to happen. Why? Because the "fmt" string is 0x0, which indicates that you called the panic manually, instead of being the address of the string "TAILQ_FIRST returned NULL", like you'd expect. > #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90, > > tsp=0xc754f828, td=0xc6426b60) > > at /freebsd/current/src/sys/kern/kern_event.c:717 > > 717 KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL")); > > (kgdb) info locals > > kq = (struct kqueue *) 0xc754f800 > > kevp = (struct kevent *) 0xc754f828 > > atv = {tv_sec = 0, tv_usec = 0} > > rtv = {tv_sec = 434, tv_usec = -1070420864} > > ttv = {tv_sec = 1, tv_usec = -1070411616} > > kn = (struct knote *) 0x0 > > marker = {kn_link = {sle_next = 0xc01b0d37}, kn_selnext = { > > sle_next = 0xc0368a20}, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc6650ac8}, > > kn_kq = 0xc6426bcc, kn_kevent = {ident = 3344374324, filter = -30080, > > flags = 49206, fflags = 3224546432, data = 431, udata = 0xe2c9dca0}, > > kn_status = 16, kn_sfflags = -1070167424, kn_sdata = 8, kn_ptr = { > > p_fp = 0xc032ac80, p_proc = 0xc032ac80}, kn_fop = 0x1af, kn_hook = 0x3} > > count = 4 > > timeout = 0 > > nkev = 0 > > error = 0 > > (kgdb) p *kq > > $2 = {kq_head = {tqh_first = 0x0, tqh_last = 0xc754f800}, kq_count = 1, > > kq_sel = {si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0, > > si_note = {slh_first = 0x0}, si_flags = 0}, kq_fdp = 0xc7571a00, > > kq_state = 0, kq_kev = {{ident = 23, filter = -1, flags = 1, fflags = 0, > > data = 69, udata = 0x80cd800}, {ident = 23, filter = -1, flags = 1, > > fflags = 0, data = 164, udata = 0x80cd800}, {ident = 27, filter = -1, > > flags = 1, fflags = 0, data = 218, udata = 0x80cf800}, {ident = 19, > > filter = -1, flags = 1, fflags = 0, data = 182, udata = 0x80cc800}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, { > > ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}}} > > (kgdb) q > > frog# ^Dexit > > Script done on Mon Oct 7 11:32:50 2002 > > I'm confused why marker - if it was removed by TAILQ_REMOVE - hasn't > kn_tqe.tqe_next and kn_tqe.tqe_prev set to (void *)-1. OK, what this means is that the marker queue entry was removed by something else going in there. THis shouldn't happen. Try adding this before the initialization of the marker data: bzero(&marker, sizeof(marker)); That should keep it from matching any removal criteria. THe only way this could keep crashing after this mod is if the queue is being destroyed out from under you. The implication here is that the queue should be protected by the object lock for the object for which the pointer to the queue instance is an element. Fixing this would be very hard (IMO). The next step (assuming it still panics) is to add: #define KQ_FREE 0x80 ...and set it into kq_state on a kqueue that has been freed and/or deallocated somewhere (then check to see if it's set, after the panic). Ugly, but it will tell you whether or not that's what's happening (scanning a dead queue). The worst case is scanning a dead queue quose memory has been reused for some other purpose. 8-(. I can't personally repeat the problem, so you're elected to do the legwork on this one. 8-(. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body
Re: [PATCH] Re: Junior Kernel Hacker page updated...
On Sun, Oct 06, 2002 at 11:14:26PM -0700, Terry Lambert wrote: > > Stefan: Did the patch fix it, or not? Sorry for the long delay. No, it did not. But I now have a rather interesting core dump. I inserted a KASSERT, so that the code looks like this: TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); while (count) { kn = TAILQ_FIRST(&kq->kq_head); KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL")); /* * Skip over all markers which are not ours. This looks * unsafe, but we can't hit the end of the list without * hitting our own marker. */ while ((kn->kn_status & KN_MARKER) && (kn != &marker)) { kn = TAILQ_NEXT(kn, kn_tqe); } TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); if (kn == &marker) { [...] Script started on Mon Oct 7 11:26:10 2002 frog# ../bin/gdb -k crash/kernel.debug.3 crash/vmcore.3 GNU gdb 5.2.0 (FreeBSD) 20020627 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-undermydesk-freebsd"... panic: bremfree: bp 0xd2adf6f0 not locked panic messages: --- panic: TAILQ_FIRST returned NULL cpuid = 1; lapic.id = 0100 panic: from debugger cpuid = 1; lapic.id = 0100 boot() called on cpu#1 syncing disks... panic: bremfree: bp 0xd2adf6f0 not locked cpuid = 1; lapic.id = 0100 boot() called on cpu#1 Uptime: 13m27s pfs_vncache_unload(): 1 entries remaining Dumping 1023 MB ata0: resetting devices .. done ad0: timeout sending command=c5 s=d0 e=00 ad0: error executing commandata0: resetting devices .. done 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 --- #0 doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223 223 dumping++; (kgdb) bt #0 doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223 #1 0xc01ba92a in boot (howto=260) at /freebsd/current/src/sys/kern/kern_shutdown.c:355 #2 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 #3 0xc01fcc77 in bremfree (bp=0xd2adf6f0) at /freebsd/current/src/sys/kern/vfs_bio.c:632 #4 0xc01fe798 in vfs_bio_awrite (bp=0x3) at /freebsd/current/src/sys/kern/vfs_bio.c:1633 #5 0xc02a7afa in ffs_fsync (ap=0xe2c9d8fc) at /freebsd/current/src/sys/ufs/ffs/ffs_vnops.c:252 #6 0xc02a7829 in VOP_FSYNC (vp=0x0, cred=0x0, waitfor=0, td=0x0) at vnode_if.h:612 #7 0xc02a6d3b in ffs_sync (mp=0xc642ba00, waitfor=2, cred=0xc22b2e80, td=0xc03643a0) at /freebsd/current/src/sys/ufs/ffs/ffs_vfsops.c:1127 #8 0xc0210998 in sync (td=0xc03643a0, uap=0x0) at /freebsd/current/src/sys/kern/vfs_syscalls.c:130 #9 0xc01ba52b in boot (howto=256) at /freebsd/current/src/sys/kern/kern_shutdown.c:264 #10 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 #11 0xc013b1d2 in db_panic () at /freebsd/current/src/sys/ddb/db_command.c:450 #12 0xc013b152 in db_command (last_cmdp=0xc035db40, cmd_table=0x0, aux_cmd_tablep=0xc03577fc, aux_cmd_tablep_end=0xc0357800) at /freebsd/current/src/sys/ddb/db_command.c:346 ---Type to continue, or q to quit--- #13 0xc013b266 in db_command_loop () at /freebsd/current/src/sys/ddb/db_command.c:472 #14 0xc013deca in db_trap (type=3, code=0) at /freebsd/current/src/sys/ddb/db_trap.c:72 #15 0xc02e9f60 in kdb_trap (type=3, code=0, regs=0xe2c9db94) at /freebsd/current/src/sys/i386/i386/db_interface.c:166 #16 0xc0302027 in trap (frame= {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -968725664, tf_esi = 256, tf_ebp = -490087456, tf_isp = -490087488, tf_ebx = 0, tf_edx = 0, tf_ecx = 32, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1070685611, tf_cs = 8, tf_eflags = 658, tf_esp = -1070272669, tf_ss = -1070406694}) at /freebsd/current/src/sys/i386/i386/trap.c:605 #17 0xc02eb768 in calltrap () at {standard input}:99 #18 0xc01babcf in panic (fmt=0x0) at /freebsd/current/src/sys/kern/kern_shutdown.c:494 #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90, tsp=0xc754f828, td=0xc6426b60) at /freebsd/current/src/sys/kern/kern_event.c:717 #20 0xc01a0ad1 in kevent (td=0xc6426b60, uap=0xe2c9dd10) at /freebsd/current/src/sys/kern/kern_event.c:470 #21 0xc030299e in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077937792, tf_esi = 4, tf_ebp = -1077941256, tf_isp = -490087052, tf_ebx = -1077937772, tf_edx = 2184, tf_---Type to continue, or q to quit--- ecx = 0, tf_eax = 363, tf_trapno = 0, tf_err =
Re: [PATCH] Re: Junior Kernel Hacker page updated...
Terry Lambert wrote: > Stefan Farfeleder wrote: > > (kgdb) l *kqueue_scan+0x242 > > 0xc01a1212 is in kqueue_scan > > (/freebsd/current/src/sys/kern/kern_event.c:716). > > 713 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); > > 714 while (count) { > > 715 kn = TAILQ_FIRST(&kq->kq_head); > > translates to: mov(%edi),%ebx > > 716 TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); > > translates to: cmpl $0x0,0x8(%ebx) > > > > This line causes the page fault because %ebx is 0. [ ... ] > Please try the attached patch. > > -- Terry Stefan: Did the patch fix it, or not? -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
[PATCH] Re: Junior Kernel Hacker page updated...
Stefan Farfeleder wrote: > (kgdb) l *kqueue_scan+0x242 > 0xc01a1212 is in kqueue_scan > (/freebsd/current/src/sys/kern/kern_event.c:716). > 713 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); > 714 while (count) { > 715 kn = TAILQ_FIRST(&kq->kq_head); > translates to: mov(%edi),%ebx > 716 TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); > translates to: cmpl $0x0,0x8(%ebx) > > This line causes the page fault because %ebx is 0. This can't happen, at least from an "empty list" perspective, even if kqueue_scan() is reentered, since the "marker" is an auto allocation on the stack, and a different stack means a different marker gets inserted (marker isn't static, so having more than one insert of the marker won't result in only a single insertion). I suspect that what is hapening is that the code is being reentered, and one marker is being treated as an event, because of whatever garbage happens to be on the stack in the allocated marker. The marker is removed, and then it is not found before you hit the end of the list. Please try the attached patch. -- Terry Index: sys/event.h === RCS file: /cvs/src/sys/sys/event.h,v retrieving revision 1.21 diff -c -r1.21 event.h *** sys/event.h 29 Jun 2002 19:14:52 - 1.21 --- sys/event.h 5 Oct 2002 15:12:24 - *** *** 160,165 --- 160,166 #define KN_QUEUED 0x02/* event is on queue */ #define KN_DISABLED 0x04/* event is disabled */ #define KN_DETACHED 0x08/* knote is detached */ + #define KN_MARKER 0x10/* knote is a scan marker */ #define kn_id kn_kevent.ident #define kn_filter kn_kevent.filter Index: kern/kern_event.c === RCS file: /cvs/src/sys/kern/kern_event.c,v retrieving revision 1.45 diff -c -r1.45 kern_event.c *** kern/kern_event.c 17 Aug 2002 02:36:16 - 1.45 --- kern/kern_event.c 5 Oct 2002 15:13:26 - *** *** 653,658 --- 653,659 FILE_LOCK_ASSERT(fp, MA_NOTOWNED); + marker.kn_status = KN_MARKER; kq = (struct kqueue *)fp->f_data; count = maxevents; if (count == 0) *** *** 713,718 --- 714,727 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); while (count) { kn = TAILQ_FIRST(&kq->kq_head); + /* +* Skip over all markers which are not ours. This looks +* unsafe, but we can't hit the end of the list without +* hitting our own marker. +*/ + while ((kn->kn_status & KN_MARKER) && (kn != &marker)) { + kn = TAILQ_NEXT(kn, kn_tqe); + } TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); if (kn == &marker) { splx(s);
Re: Junior Kernel Hacker page updated...
On Fri, Oct 04, 2002 at 04:33:17PM -0400, John Baldwin wrote: I wrote: > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 0; lapic.id = > >> fault virtual address = 0x8 > >> fault code = supervisor read, page not present > >> instruction pointer = 0x8:0xc01a1212 > >> stack pointer = 0x10:0xe5226c14 > >> frame pointer = 0x10:0xe5226ca0 > >> code segment= base 0x0, limit 0xf, type 0x1b > >> = DPL 0, pres 1, def32 1, gran 1 > >> processor eflags= interrupt enabled, resume, IOPL = 0 > >> current process = 56525 (make) > >> > >> kernel: type 12 trap, code = 0 > >> > >> Stopped atkqueue_scan+0x242: cmpl $0,0x8(%ebx) > >> db> trace > >> kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242 > >> kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1 > >> syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be > >> %%% > Even better, pop up gdb on kernel.debug and do > 'l *kqueue_scan+0x242' to look at the offending line of code. > addr2line can also be useful here similarly. (kgdb) l *kqueue_scan+0x242 0xc01a1212 is in kqueue_scan (/freebsd/current/src/sys/kern/kern_event.c:716). 711 } 712 713 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); 714 while (count) { 715 kn = TAILQ_FIRST(&kq->kq_head); translates to: mov(%edi),%ebx 716 TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); translates to: cmpl $0x0,0x8(%ebx) This line causes the page fault because %ebx is 0. je fe3 mov0x8(%ebx),%edx [...] 717 if (kn == &marker) { 718 splx(s); 719 if (count == maxevents) 720 goto retry; I've added this after line 715: 716 if (kn == NULL) { 717 Debugger("TAILQ_FIRST returns NULL"); 718 } and after 4 freezes, I really came into ddb in line 717. However, when trying to produce a dump, this occured: db> panic panic: from debugger cpuid = 1; lapic.id = 0100 boot() called on cpu#1 syncing disks... panic: bremfree: bp 0xd2a42990 not locked boot() called on cpu#1 Uptime: 10m13s pfs_vncache_unload(): 1 entries remaining Dumping 1023 MB ata0: resetting devices ata0: mask=03 ostat0=50 ostat2=00 ad0: ATAPI 00 00 ata0-slave: ATAPI 00 00 ata0: mask=03 stat0=50 stat1=00 ad0: ATA 01 a5 ata0: devices=01 and I had to reboot without a dump :-( Regards, Stefan Farfeleder To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
* De: Poul-Henning Kamp <[EMAIL PROTECTED]> [ Data: 2002-10-04 ] [ Subjecte: Re: Junior Kernel Hacker page updated... ] > In message <[EMAIL PROTECTED]>, Stefan Farfeleder writes: > > > > I have committed your patch, but left it #undef'ed until we get > the kernel straightened out. > > On behalf of the FreeBSD project I have to warn you that if you > persist in doing work of this kind over and over again, you will > eventually be subjected to a commit bit. You have been warned :-) > > Thank you very much for you work! :-) > > Poul-Henning > > >The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH. > >Is anybody using them? Building with -DREMOTE doesn't compile and with > >-DRMT_WILL_WATCH the linker is complaining about the lack of the > >functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of > >those defines? I understand Juli Mallett wants to rewrite make, so maybe > >this effort would be wasted. > > I belive the RMT/REMOTE stuff is part of an earlier attempt at putting > in some kind of cluster functionality. > > If it is useful in any capacity, even as hint of what/how to do such > a thing, I think we should leave it. If it is just old junk we > should boot it. NetBSD has it working. I don't recall anymore if the diffs were enough to dissuade me from making ours work, or if I just had no interest. It's gone from my local tree simply because I got tired of grepping for things to debug, and running into ifdef's cruft. -- Juli Mallett <[EMAIL PROTECTED]> | FreeBSD: The Power To Serve Will break world for fulltime employment. | finger [EMAIL PROTECTED] http://people.FreeBSD.org/~jmallett/ | Support my FreeBSD hacking! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
In message <[EMAIL PROTECTED]>, Stefan Farfeleder writes: > I have committed your patch, but left it #undef'ed until we get the kernel straightened out. On behalf of the FreeBSD project I have to warn you that if you persist in doing work of this kind over and over again, you will eventually be subjected to a commit bit. You have been warned :-) Thank you very much for you work! :-) Poul-Henning >The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH. >Is anybody using them? Building with -DREMOTE doesn't compile and with >-DRMT_WILL_WATCH the linker is complaining about the lack of the >functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of >those defines? I understand Juli Mallett wants to rewrite make, so maybe >this effort would be wasted. I belive the RMT/REMOTE stuff is part of an earlier attempt at putting in some kind of cluster functionality. If it is useful in any capacity, even as hint of what/how to do such a thing, I think we should leave it. If it is just old junk we should boot it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
On 04-Oct-2002 Juli Mallett wrote: > * De: Stefan Farfeleder <[EMAIL PROTECTED]> [ Data: 2002-10-04 ] > [ Subjecte: Re: Junior Kernel Hacker page updated... ] >> On Thu, Oct 03, 2002 at 04:41:46PM +0200, Poul-Henning Kamp wrote: >> > >> > Hi Stefan, >> > >> > I tried this patch and it paniced my (almost-) current machine with >> > a pagefault in the kqueue code: Bravo! >> > >> > I can see that there is some amount of #ifdef stuff in your patch, >> >> The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH. >> Is anybody using them? Building with -DREMOTE doesn't compile and with >> -DRMT_WILL_WATCH the linker is complaining about the lack of the >> functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of >> those defines? I understand Juli Mallett wants to rewrite make, so maybe >> this effort would be wasted. >> >> > in light of that, would it be possible to make an #ifdef'ed version >> > of your patch which we could commit ? >> >> Ok, the new patch is attached. Compile with -DUSE_KQUEUE to use the new >> code. >> >> > That way we give the kqueue hackers a good testcase, and we can >> > easily enable when they have solved the problem. >> >> After Don Lewis fixed the 'could sleep with' problem (thanks!), I'm >> still encountering freezes and panics. Here's one I caught: >> >> [warning: parts are typed in] >> %%% >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; lapic.id = >> fault virtual address = 0x8 >> fault code = supervisor read, page not present >> instruction pointer = 0x8:0xc01a1212 >> stack pointer = 0x10:0xe5226c14 >> frame pointer = 0x10:0xe5226ca0 >> code segment= base 0x0, limit 0xf, type 0x1b >> = DPL 0, pres 1, def32 1, gran 1 >> processor eflags= interrupt enabled, resume, IOPL = 0 >> current process = 56525 (make) >> >> kernel: type 12 trap, code = 0 >> >> Stopped atkqueue_scan+0x242: cmpl $0,0x8(%ebx) >> db> trace >> kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242 >> kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1 >> syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be >> %%% > > Run the kqueue source file through gcc with -fverbose-asm -S and then > look at the resulting .s file, grep for cmpl.*0x8( and look for > what's being dereferenced without being checked for NULL. Even better, pop up gdb on kernel.debug and do 'l *kqueue_scan+0x242' to look at the offending line of code. addr2line can also be useful here similarly. -- John Baldwin <[EMAIL PROTECTED]> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
* De: Stefan Farfeleder <[EMAIL PROTECTED]> [ Data: 2002-10-04 ] [ Subjecte: Re: Junior Kernel Hacker page updated... ] > On Thu, Oct 03, 2002 at 04:41:46PM +0200, Poul-Henning Kamp wrote: > > > > Hi Stefan, > > > > I tried this patch and it paniced my (almost-) current machine with > > a pagefault in the kqueue code: Bravo! > > > > I can see that there is some amount of #ifdef stuff in your patch, > > The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH. > Is anybody using them? Building with -DREMOTE doesn't compile and with > -DRMT_WILL_WATCH the linker is complaining about the lack of the > functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of > those defines? I understand Juli Mallett wants to rewrite make, so maybe > this effort would be wasted. > > > in light of that, would it be possible to make an #ifdef'ed version > > of your patch which we could commit ? > > Ok, the new patch is attached. Compile with -DUSE_KQUEUE to use the new > code. > > > That way we give the kqueue hackers a good testcase, and we can > > easily enable when they have solved the problem. > > After Don Lewis fixed the 'could sleep with' problem (thanks!), I'm > still encountering freezes and panics. Here's one I caught: > > [warning: parts are typed in] > %%% > Fatal trap 12: page fault while in kernel mode > cpuid = 0; lapic.id = > fault virtual address = 0x8 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc01a1212 > stack pointer = 0x10:0xe5226c14 > frame pointer = 0x10:0xe5226ca0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 56525 (make) > > kernel: type 12 trap, code = 0 > > Stopped atkqueue_scan+0x242: cmpl $0,0x8(%ebx) > db> trace > kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242 > kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1 > syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be > %%% Run the kqueue source file through gcc with -fverbose-asm -S and then look at the resulting .s file, grep for cmpl.*0x8( and look for what's being dereferenced without being checked for NULL. > The core file doesn't seem to be particularly helpful, as the > kqueue_scan frame seems to miss: > > %%% > (kgdb) bt > #0 doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223 > #1 0xc01ba92a in boot (howto=260) > at /freebsd/current/src/sys/kern/kern_shutdown.c:355 > #2 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 > #3 0xc013b1d2 in db_panic () at /freebsd/current/src/sys/ddb/db_command.c:450 > #4 0xc013b152 in db_command (last_cmdp=0xc035db20, cmd_table=0x0, > aux_cmd_tablep=0xc03577dc, aux_cmd_tablep_end=0xc03577e0) > at /freebsd/current/src/sys/ddb/db_command.c:346 > #5 0xc013b266 in db_command_loop () > at /freebsd/current/src/sys/ddb/db_command.c:472 > #6 0xc013deca in db_trap (type=12, code=0) > at /freebsd/current/src/sys/ddb/db_trap.c:72 > #7 0xc02e9f60 in kdb_trap (type=12, code=0, regs=0xe5226bd4) > at /freebsd/current/src/sys/i386/i386/db_interface.c:166 > #8 0xc0302602 in trap_fatal (frame=0xe5226bd4, eva=0) > at /freebsd/current/src/sys/i386/i386/trap.c:841 > #9 0xc03022e2 in trap_pfault (frame=0xe5226bd4, usermode=0, eva=8) > at /freebsd/current/src/sys/i386/i386/trap.c:760 > #10 0xc0301e0d in trap (frame= > {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -943606528, tf_esi = -943606488, >tf_ebp = -450728800, tf_isp = -450728960, tf_ebx = 0, tf_edx = 4, tf_ecx = 1, tf_eax >= 0, tf_trapno = 12, tf_err = 0, tf_eip = -1072033262, tf_cs = 8, tf_eflags = 66050, >tf_esp = -966356416, tf_ss = 0}) > ---Type to continue, or q to quit--- > at /freebsd/current/src/sys/i386/i386/trap.c:446 > #11 0xc02eb768 in calltrap () at {standard input}:99 > #12 0xc01a0ad1 in kevent (td=0xc70ecea0, uap=0xe5226d10) > at /freebsd/current/src/sys/kern/kern_event.c:470 > #13 0xc030299e in syscall (frame= > {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135845760, tf_esi = 135846240, >tf_ebp = -1077941224, tf_isp = -450728588, tf_ebx = 134831872, tf_edx = 2184, tf_ecx >= 0, tf_eax = 363, tf_trapno = 12, tf_err = 2, tf_eip = 134626551, tf_cs = 31, >tf_eflags = 514, tf_esp = -1077941348, tf_ss = 47}) > at /freebsd/current/src/sys/i386/i386/trap.c:1050 > #14 0xc02eb7bd in Xint0x80_syscall () at {standard input}:141 > ---Can't read userspace from dump, or kernel p
Re: Junior Kernel Hacker page updated...
On Thu, Oct 03, 2002 at 04:41:46PM +0200, Poul-Henning Kamp wrote: > > Hi Stefan, > > I tried this patch and it paniced my (almost-) current machine with > a pagefault in the kqueue code: Bravo! > > I can see that there is some amount of #ifdef stuff in your patch, The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH. Is anybody using them? Building with -DREMOTE doesn't compile and with -DRMT_WILL_WATCH the linker is complaining about the lack of the functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of those defines? I understand Juli Mallett wants to rewrite make, so maybe this effort would be wasted. > in light of that, would it be possible to make an #ifdef'ed version > of your patch which we could commit ? Ok, the new patch is attached. Compile with -DUSE_KQUEUE to use the new code. > That way we give the kqueue hackers a good testcase, and we can > easily enable when they have solved the problem. After Don Lewis fixed the 'could sleep with' problem (thanks!), I'm still encountering freezes and panics. Here's one I caught: [warning: parts are typed in] %%% Fatal trap 12: page fault while in kernel mode cpuid = 0; lapic.id = fault virtual address = 0x8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01a1212 stack pointer = 0x10:0xe5226c14 frame pointer = 0x10:0xe5226ca0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 56525 (make) kernel: type 12 trap, code = 0 Stopped atkqueue_scan+0x242: cmpl $0,0x8(%ebx) db> trace kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242 kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1 syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be %%% The core file doesn't seem to be particularly helpful, as the kqueue_scan frame seems to miss: %%% (kgdb) bt #0 doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223 #1 0xc01ba92a in boot (howto=260) at /freebsd/current/src/sys/kern/kern_shutdown.c:355 #2 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508 #3 0xc013b1d2 in db_panic () at /freebsd/current/src/sys/ddb/db_command.c:450 #4 0xc013b152 in db_command (last_cmdp=0xc035db20, cmd_table=0x0, aux_cmd_tablep=0xc03577dc, aux_cmd_tablep_end=0xc03577e0) at /freebsd/current/src/sys/ddb/db_command.c:346 #5 0xc013b266 in db_command_loop () at /freebsd/current/src/sys/ddb/db_command.c:472 #6 0xc013deca in db_trap (type=12, code=0) at /freebsd/current/src/sys/ddb/db_trap.c:72 #7 0xc02e9f60 in kdb_trap (type=12, code=0, regs=0xe5226bd4) at /freebsd/current/src/sys/i386/i386/db_interface.c:166 #8 0xc0302602 in trap_fatal (frame=0xe5226bd4, eva=0) at /freebsd/current/src/sys/i386/i386/trap.c:841 #9 0xc03022e2 in trap_pfault (frame=0xe5226bd4, usermode=0, eva=8) at /freebsd/current/src/sys/i386/i386/trap.c:760 #10 0xc0301e0d in trap (frame= {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -943606528, tf_esi = -943606488, tf_ebp = -450728800, tf_isp = -450728960, tf_ebx = 0, tf_edx = 4, tf_ecx = 1, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1072033262, tf_cs = 8, tf_eflags = 66050, tf_esp = -966356416, tf_ss = 0}) ---Type to continue, or q to quit--- at /freebsd/current/src/sys/i386/i386/trap.c:446 #11 0xc02eb768 in calltrap () at {standard input}:99 #12 0xc01a0ad1 in kevent (td=0xc70ecea0, uap=0xe5226d10) at /freebsd/current/src/sys/kern/kern_event.c:470 #13 0xc030299e in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135845760, tf_esi = 135846240, tf_ebp = -1077941224, tf_isp = -450728588, tf_ebx = 134831872, tf_edx = 2184, tf_ecx = 0, tf_eax = 363, tf_trapno = 12, tf_err = 2, tf_eip = 134626551, tf_cs = 31, tf_eflags = 514, tf_esp = -1077941348, tf_ss = 47}) at /freebsd/current/src/sys/i386/i386/trap.c:1050 #14 0xc02eb7bd in Xint0x80_syscall () at {standard input}:141 ---Can't read userspace from dump, or kernel process--- %%% objdump -dS kern_event.o says the following about kqueue_scan+0x242 [==0xfd2]: %%% fce: 90 nop fcf: 90 nop kn = TAILQ_FIRST(&kq->kq_head); fd0: 8b 1f mov(%edi),%ebx TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); fd2: 83 7b 08 00 cmpl $0x0,0x8(%ebx) fd6: 74 0b je fe3 fd8: 8b 53 08mov0x8(%ebx),%edx fdb: 8b 43 0cmov0xc(%ebx),%eax fde: 89 42 0cmov%eax,0xc(%edx) fe1: eb 06 jmpfe9 fe3: 8b 43 0cmov0xc(%ebx),%eax fe6: 89 47 04mov%eax,0x4(%edi) fe9: 8b 43 0c
Re: Junior Kernel Hacker page updated...
Hi Stefan, I tried this patch and it paniced my (almost-) current machine with a pagefault in the kqueue code: Bravo! I can see that there is some amount of #ifdef stuff in your patch, in light of that, would it be possible to make an #ifdef'ed version of your patch which we could commit ? That way we give the kqueue hackers a good testcase, and we can easily enable when they have solved the problem. Poul-Henning In message <[EMAIL PROTECTED]>, Stefan Farfeleder writes: > >--VS++wcV0S1rZb1Fb >Content-Type: text/plain; charset=us-ascii >Content-Disposition: inline > >On Sat, Sep 14, 2002 at 12:17:53PM +0200, Poul-Henning Kamp wrote: >> >> This is just to note that I have updated the JKH page with a lot of new >> stuff, so if your coding-pencil itches: >> >> http://people.freebsd.org/~phk/TODO/ > >|Make -j improvement >| >|make(1) with -j option uses a select loop to wait for events, and every >|100msec it drops out to look for processes exited etc. A pure "make >|buildworld" on a single-CPU machine is up to 25% faster that the best >|"make -j N buildworld" time on the same hardware. Changing to timeout >|to be 10msec improves things about 10%. >|I think that make(1) should use kqueue(2) instead, since that would >|eliminate the need for timeouts. > >Ok, here's what I came up with. However, with the patch applied, each >'make buildworld' on a SMP machine throws tons of > >/freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" >locked from /freebsd/current/src/sys/kern/kern_event.c:959 > >at me and freezes badly at some point (no breaking into ddb possible). >This is totally repeatable. Is anybody able to reproduce (and maybe >fix) this? > >Regards, >Stefan Farfeleder > >--VS++wcV0S1rZb1Fb >Content-Type: text/plain; charset=us-ascii >Content-Disposition: attachment; filename="make.diff" > >Index: job.c >=== >RCS file: /home/ncvs/src/usr.bin/make/job.c,v >retrieving revision 1.43 >diff -u -r1.43 job.c >--- job.c 29 Sep 2002 00:20:28 - 1.43 >+++ job.c 2 Oct 2002 21:34:51 - >@@ -64,9 +64,8 @@ > *Job_CatchOutput Print any output our children have produced. > *Should also be called fairly frequently to > *keep the user informed of what's going on. >- *If no output is waiting, it will block for >- *a time given by the SEL_* constants, below, >- *or until output is ready. >+ *If no output is waiting, it will block until >+ *a child terminates or until output is ready. > * > *Job_InitCalled to intialize this module. in addition, > *any commands attached to the .BEGIN target >@@ -107,6 +106,8 @@ > #include > #include > #include >+#include >+#include > #include > #include > #include >@@ -237,8 +238,7 @@ >* (2) a job can only be run locally, but >* nLocal equals maxLocal */ > #ifndef RMT_WILL_WATCH >-static fd_set outputs;/* Set of descriptors of pipes connected to >- * the output channels of children */ >+static intkqfd; /* File descriptor obtained by kqueue() */ > #endif > > STATIC GNode *lastNode; /* The node for which output was most recently >@@ -692,8 +692,6 @@ > if (usePipes) { > #ifdef RMT_WILL_WATCH > Rmt_Ignore(job->inPipe); >-#else >- FD_CLR(job->inPipe, &outputs); > #endif > if (job->outPipe != job->inPipe) { > (void) close(job->outPipe); >@@ -1265,14 +1263,25 @@ > /* >* The first time a job is run for a node, we set the current >* position in the buffer to the beginning and mark another >- * stream to watch in the outputs mask >+ * stream to watch in the outputs mask. The termination of this >+ * job and the availability of new data in the pipe are >+ * registered in the kqueue. >*/ >+ struct kevent kev[2]; >+ > job->curPos = 0; > > #ifdef RMT_WILL_WATCH > Rmt_Watch(job->inPipe, JobLocalInput, job); > #else >- FD_SET(job->inPipe, &outputs); >+ EV_SET(&kev[0], job->inPipe, EVFILT_READ, EV_ADD, 0, 0, job); >+ EV_SET(&kev[1], job->pid, EVFILT_PROC, EV_ADD | EV_ONESHOT, >+ NOTE_EXIT, 0, NULL); >+ if (kevent(kqfd, kev, 2, NULL, 0, NULL) != 0) { >+ /* kevent() will fail if the job is already finished */ >+ if (errno != EBADF && errno != ESRCH) >+ Punt("kevent: %s", strerror(errno)); >+ } > #endif /* RMT_WILL_WATCH */ > } > >@@ -2229,12 +2238,12 @@ > Job_CatchOutput() > { > int nfds; >-struct timeva
Re: Junior Kernel Hacker page updated...
On Wed, 2002-10-02 at 14:40, Stefan Farfeleder wrote: > On Sat, Sep 14, 2002 at 12:17:53PM +0200, Poul-Henning Kamp wrote: > > > > This is just to note that I have updated the JKH page with a lot of new > > stuff, so if your coding-pencil itches: > > > > http://people.freebsd.org/~phk/TODO/ > > |Make -j improvement > | > |make(1) with -j option uses a select loop to wait for events, and every > |100msec it drops out to look for processes exited etc. A pure "make > |buildworld" on a single-CPU machine is up to 25% faster that the best > |"make -j N buildworld" time on the same hardware. Changing to timeout > |to be 10msec improves things about 10%. > |I think that make(1) should use kqueue(2) instead, since that would > |eliminate the need for timeouts. > > Ok, here's what I came up with. However, with the patch applied, each > 'make buildworld' on a SMP machine throws tons of > > /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" locked from /freebsd/current/src/sys/kern/kern_event.c:959 > > at me and freezes badly at some point (no breaking into ddb possible). > This is totally repeatable. Is anybody able to reproduce (and maybe > fix) this? This question is not relevant to the problem at hand, but... wouldn't it be more portable to catch SIGCHLD (and continue using select(2)) than to use kqueue(2)? -Peter- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
On 2 Oct, Don Lewis wrote: > On 2 Oct, Stefan Farfeleder wrote: > >> /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" >locked from /freebsd/current/src/sys/kern/kern_event.c:959 >> >> at me and freezes badly at some point (no breaking into ddb possible). >> This is totally repeatable. Is anybody able to reproduce (and maybe >> fix) this? > > It looks like the problem is that knote_attach() calls hashinit() while > holding the lock, and that hashinit() calls MALLOC(..., M_WAITOK). > > Try the following patch: > [snip] I went ahead and committed the patch. It survived a "make -j10 buildworld" test with your patched version of make. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
On 2 Oct, Stefan Farfeleder wrote: > /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" >locked from /freebsd/current/src/sys/kern/kern_event.c:959 > > at me and freezes badly at some point (no breaking into ddb possible). > This is totally repeatable. Is anybody able to reproduce (and maybe > fix) this? It looks like the problem is that knote_attach() calls hashinit() while holding the lock, and that hashinit() calls MALLOC(..., M_WAITOK). Try the following patch: Index: sys/kern/kern_event.c === RCS file: /home/ncvs/src/sys/kern/kern_event.c,v retrieving revision 1.45 diff -u -r1.45 kern_event.c --- sys/kern/kern_event.c 17 Aug 2002 02:36:16 - 1.45 +++ sys/kern/kern_event.c 3 Oct 2002 03:10:09 - @@ -953,15 +953,27 @@ static void knote_attach(struct knote *kn, struct filedesc *fdp) { - struct klist *list, *oldlist; + struct klist *list, *oldlist, *tmp_knhash; + u_long tmp_knhashmask; int size, newsize; FILEDESC_LOCK(fdp); if (! kn->kn_fop->f_isfd) { - if (fdp->fd_knhashmask == 0) - fdp->fd_knhash = hashinit(KN_HASHSIZE, M_KQUEUE, - &fdp->fd_knhashmask); + if (fdp->fd_knhashmask == 0) { + FILEDESC_UNLOCK(fdp); + tmp_knhash = hashinit(KN_HASHSIZE, M_KQUEUE, + &tmp_knhashmask); + FILEDESC_LOCK(fdp); + if (fdp->fd_knhashmask == 0) { + fdp->fd_knhash = tmp_knhash; + fdp->fd_knhashmask = tmp_knhashmask; + } else { + FILEDESC_UNLOCK(fdp); + free(tmp_knhash, M_KQUEUE); + FILEDESC_LOCK(fdp); + } + } list = &fdp->fd_knhash[KN_HASH(kn->kn_id, fdp->fd_knhashmask)]; goto done; } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Junior Kernel Hacker page updated...
On Sat, Sep 14, 2002 at 12:17:53PM +0200, Poul-Henning Kamp wrote: > > This is just to note that I have updated the JKH page with a lot of new > stuff, so if your coding-pencil itches: > > http://people.freebsd.org/~phk/TODO/ |Make -j improvement | |make(1) with -j option uses a select loop to wait for events, and every |100msec it drops out to look for processes exited etc. A pure "make |buildworld" on a single-CPU machine is up to 25% faster that the best |"make -j N buildworld" time on the same hardware. Changing to timeout |to be 10msec improves things about 10%. |I think that make(1) should use kqueue(2) instead, since that would |eliminate the need for timeouts. Ok, here's what I came up with. However, with the patch applied, each 'make buildworld' on a SMP machine throws tons of /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" locked from /freebsd/current/src/sys/kern/kern_event.c:959 at me and freezes badly at some point (no breaking into ddb possible). This is totally repeatable. Is anybody able to reproduce (and maybe fix) this? Regards, Stefan Farfeleder Index: job.c === RCS file: /home/ncvs/src/usr.bin/make/job.c,v retrieving revision 1.43 diff -u -r1.43 job.c --- job.c 29 Sep 2002 00:20:28 - 1.43 +++ job.c 2 Oct 2002 21:34:51 - @@ -64,9 +64,8 @@ * Job_CatchOutput Print any output our children have produced. * Should also be called fairly frequently to * keep the user informed of what's going on. - * If no output is waiting, it will block for - * a time given by the SEL_* constants, below, - * or until output is ready. + * If no output is waiting, it will block until + * a child terminates or until output is ready. * * Job_InitCalled to intialize this module. in addition, * any commands attached to the .BEGIN target @@ -107,6 +106,8 @@ #include #include #include +#include +#include #include #include #include @@ -237,8 +238,7 @@ * (2) a job can only be run locally, but * nLocal equals maxLocal */ #ifndef RMT_WILL_WATCH -static fd_set outputs;/* Set of descriptors of pipes connected to -* the output channels of children */ +static int kqfd; /* File descriptor obtained by kqueue() */ #endif STATIC GNode *lastNode; /* The node for which output was most recently @@ -692,8 +692,6 @@ if (usePipes) { #ifdef RMT_WILL_WATCH Rmt_Ignore(job->inPipe); -#else - FD_CLR(job->inPipe, &outputs); #endif if (job->outPipe != job->inPipe) { (void) close(job->outPipe); @@ -1265,14 +1263,25 @@ /* * The first time a job is run for a node, we set the current * position in the buffer to the beginning and mark another -* stream to watch in the outputs mask +* stream to watch in the outputs mask. The termination of this +* job and the availability of new data in the pipe are +* registered in the kqueue. */ + struct kevent kev[2]; + job->curPos = 0; #ifdef RMT_WILL_WATCH Rmt_Watch(job->inPipe, JobLocalInput, job); #else - FD_SET(job->inPipe, &outputs); + EV_SET(&kev[0], job->inPipe, EVFILT_READ, EV_ADD, 0, 0, job); + EV_SET(&kev[1], job->pid, EVFILT_PROC, EV_ADD | EV_ONESHOT, + NOTE_EXIT, 0, NULL); + if (kevent(kqfd, kev, 2, NULL, 0, NULL) != 0) { + /* kevent() will fail if the job is already finished */ + if (errno != EBADF && errno != ESRCH) + Punt("kevent: %s", strerror(errno)); + } #endif /* RMT_WILL_WATCH */ } @@ -2229,12 +2238,12 @@ Job_CatchOutput() { int nfds; -struct timeval timeout; -fd_set readfds; -LstNode ln; -Job *job; #ifdef RMT_WILL_WATCH int pnJobs; /* Previous nJobs */ +#else +struct keventkev[4]; /* 4 is somewhat arbitrary */ +size_t kevsize = sizeof(kev) / sizeof(kev[0]); +int i; #endif (void) fflush(stdout); @@ -2262,25 +2271,20 @@ } #else if (usePipes) { - readfds = outputs; - timeout.tv_sec = SEL_SEC; - timeout.tv_usec = SEL_USEC; - - if ((nfds = select(FD_SETSIZE, &readfds, (fd_set *) 0, - (fd_set *) 0, &timeout)) <= 0) - return; - else { - if (Lst_