Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-10 Thread Terry Lambert

Stefan Farfeleder wrote:
> Imagine this scenario where CPU 0 inserts a knote kn1 (the marker) in
> knote_scan and CPU 1 kn2 in kqueue_enqueue:
> 
> CPU 0   |   CPU 1
> +---
> kn1->kn_tqe.tqe_next = NULL;|
> |
> +---
> kn1->kn_tqe.tqe_prev =  |   kn2->kn_tqe.tqe_next = NULL;
> kq_head.tqh_last;   |
> +---
> *kq_head.tqh_last = kn1;|   kn2->kn_tqe.tqe_prev =
> |   kq_head.tqh_last;
> +---
> kq_head.tqh_last =  |   *kq_head.tqh_last = kn2;
> &kn1->kn_tqe.tqe_next;  |
> +---
> |   kq_head.tqh_last =
> |   &kn2->kn_tqe.tqe_next;
> 
> The marker would never appear on the queue.

The macro is broken.  Here is a fix.

#define TAILQ_INSERT_TAIL(head, elm, field) do {\
void **savepp;  \
(elm)->field.tqe_next = NULL;   \
(elm)->field.tqe_prev = (head)->tqh_last;   \
savepp = (head)->tqh_last;  \
(head)->tqh_last = &(elm)->field.tqe_next;  \
(elm)->field.tqe_prev = savepp; \
*savepp = (elm);\
} while (0)

I'm not sure that it's possible to totally close the race; I think the
way you would have to do it is to traverse the tqh_last pointer until
the tqh_last->file.tqe_next == NULL, before reference it (ugly).  The
failure mode with the race is an inverted insertion order, which I
think, in this case, is not a problem.  The double setting of the prev
guarantees it has a "harmless" value for a traversal during insertion;
it acts as if the insertion had not occurred, if there are two that
occurred simultaneously, until both sets of data are valid.

This is doable in Intel assembly, I think (CMPXCHGL).  Ugh.  On other
architectures, there's no way to avoid a mutex (e.g. MIPS).

I hate tailq's.  8-(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-10 Thread Stefan Farfeleder

On Wed, Oct 09, 2002 at 04:07:45PM -0700, Terry Lambert wrote:
> Stefan Farfeleder wrote:
> > Is it just a warning or does it pose a real problem?
> > 
> > I think the problem with the current code is that knote_{en,de}queue can
> > be executed in parallel (on another CPU, spl*() can't prevent that, can
> > it?) with kqueue_scan and that kq->kq_head thus can be corrupted.
> > Or am I totally wrong?
> 
> My patch would have worked, in that case, since it would ensure
> one marker entry with a unique stack address per simultaneous
> scanner.
> 
> It has to be that the queue itself is being deleted out from
> under it: the problem is not the scan, nor the insert or the
> delete.
> 
> Most likely, this is for an object whose queue is not tracked
> by process, or for a process queue that's being examined by
> another process (e.g. kevent's on fork/exit/etc.).
> 
> You can verify this for your own satisfaction by looking at the
> pointer manipulation order for the insertion and deletion; the
> insertion sets the next pointer before setting the pointer to
> the inserted object, and the deletion sets the pointer that
> used to point to the deleted object to the delete object's next,
> before deleting the object.  Thus, traversals in progress should
> not result in an error.

Imagine this scenario where CPU 0 inserts a knote kn1 (the marker) in
knote_scan and CPU 1 kn2 in kqueue_enqueue:


CPU 0   |   CPU 1
+---
kn1->kn_tqe.tqe_next = NULL;|
|
+---
kn1->kn_tqe.tqe_prev =  |   kn2->kn_tqe.tqe_next = NULL;
kq_head.tqh_last;   |
+---
*kq_head.tqh_last = kn1;|   kn2->kn_tqe.tqe_prev =
|   kq_head.tqh_last;
+---
kq_head.tqh_last =  |   *kq_head.tqh_last = kn2;
&kn1->kn_tqe.tqe_next;  |
+---
|   kq_head.tqh_last =
|   &kn2->kn_tqe.tqe_next;

The marker would never appear on the queue.

Regards,
Stefan Farfeleder

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-09 Thread Don Lewis

On 10 Oct, Stefan Farfeleder wrote:
> On Tue, Oct 08, 2002 at 09:26:29PM -0700, Don Lewis wrote:
>> On  8 Oct, Stefan Farfeleder wrote:

>> > However, WITNESS complains (only once) about this:
>> > lock order reversal
>> >  1st 0xc662140c kqueue mutex (kqueue mutex) @ 
>/freebsd/current/src/sys/kern/kern_event.c:714
>> >  2nd 0xc6727d00 pipe mutex (pipe mutex) @ 
>/freebsd/current/src/sys/kern/sys_pipe.c:1478
>> 
>> That's pretty similar to the lock order reversal I've seen in the pipe
>> code and it's interaction with sigio, which is not suprising since
>> pipeselwakeup() calls both pgsigio() and KNOTE(), often while the pipe
>> lock is held.  Correctly fixing this doesn't look easy ...
> 
> Is it just a warning or does it pose a real problem?

It's a warning of a potential problem.  If someone else was trying to
grab the locks in the opposite order at the same time, you'd get a
deadlock.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-09 Thread Terry Lambert

Stefan Farfeleder wrote:
> Is it just a warning or does it pose a real problem?
> 
> I think the problem with the current code is that knote_{en,de}queue can
> be executed in parallel (on another CPU, spl*() can't prevent that, can
> it?) with kqueue_scan and that kq->kq_head thus can be corrupted.
> Or am I totally wrong?

My patch would have worked, in that case, since it would ensure
one marker entry with a unique stack address per simultaneous
scanner.

It has to be that the queue itself is being deleted out from
under it: the problem is not the scan, nor the insert or the
delete.

Most likely, this is for an object whose queue is not tracked
by process, or for a process queue that's being examined by
another process (e.g. kevent's on fork/exit/etc.).

You can verify this for your own satisfaction by looking at the
pointer manipulation order for the insertion and deletion; the
insertion sets the next pointer before setting the pointer to
the inserted object, and the deletion sets the pointer that
used to point to the deleted object to the delete object's next,
before deleting the object.  Thus, traversals in progress should
not result in an error.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-09 Thread Stefan Farfeleder

On Tue, Oct 08, 2002 at 09:26:29PM -0700, Don Lewis wrote:
> On  8 Oct, Stefan Farfeleder wrote:
> > On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote:
> 
> > Following the advice from the spl* man page I turned the spl* calls to a
> > mutex and was surprised to see it working. My SMP -current survived a 'make
> > -j16 buildworld' with make using kqueue() (which it did not a single
> > time out of >30 times before). Further testings will follow tomorrow.

Building 6 worlds in a row with -j ranging from 4 to 128 didn't crash
it.

> > However, WITNESS complains (only once) about this:
> > lock order reversal
> >  1st 0xc662140c kqueue mutex (kqueue mutex) @ 
>/freebsd/current/src/sys/kern/kern_event.c:714
> >  2nd 0xc6727d00 pipe mutex (pipe mutex) @ 
>/freebsd/current/src/sys/kern/sys_pipe.c:1478
> 
> That's pretty similar to the lock order reversal I've seen in the pipe
> code and it's interaction with sigio, which is not suprising since
> pipeselwakeup() calls both pgsigio() and KNOTE(), often while the pipe
> lock is held.  Correctly fixing this doesn't look easy ...

Is it just a warning or does it pose a real problem?


I think the problem with the current code is that knote_{en,de}queue can
be executed in parallel (on another CPU, spl*() can't prevent that, can
it?) with kqueue_scan and that kq->kq_head thus can be corrupted.
Or am I totally wrong?

@Poul: Since you are the only person who reported a kernel crash too,
does the version with the mutex work for you?

Regards,
Stefan Farfelder

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-08 Thread Don Lewis

On  8 Oct, Stefan Farfeleder wrote:
> On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote:

> Following the advice from the spl* man page I turned the spl* calls to a
> mutex and was surprised to see it working. My SMP -current survived a 'make
> -j16 buildworld' with make using kqueue() (which it did not a single
> time out of >30 times before). Further testings will follow tomorrow.
> 
> However, WITNESS complains (only once) about this:
> lock order reversal
>  1st 0xc662140c kqueue mutex (kqueue mutex) @ 
>/freebsd/current/src/sys/kern/kern_event.c:714
>  2nd 0xc6727d00 pipe mutex (pipe mutex) @ 
>/freebsd/current/src/sys/kern/sys_pipe.c:1478

That's pretty similar to the lock order reversal I've seen in the pipe
code and it's interaction with sigio, which is not suprising since
pipeselwakeup() calls both pgsigio() and KNOTE(), often while the pipe
lock is held.  Correctly fixing this doesn't look easy ...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-08 Thread Stefan Farfeleder

On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote:
> 
> *** OK, it's very hard to believe you didn't break into the
> *** debugger and manually call "pnaic" to get this to happen.

You're right, this is exactly what I did.

> I can't personally repeat the problem, so you're elected to do
> the legwork on this one.  8-(.

Following the advice from the spl* man page I turned the spl* calls to a
mutex and was surprised to see it working. My SMP -current survived a 'make
-j16 buildworld' with make using kqueue() (which it did not a single
time out of >30 times before). Further testings will follow tomorrow.

However, WITNESS complains (only once) about this:
lock order reversal
 1st 0xc662140c kqueue mutex (kqueue mutex) @ 
/freebsd/current/src/sys/kern/kern_event.c:714
 2nd 0xc6727d00 pipe mutex (pipe mutex) @ /freebsd/current/src/sys/kern/sys_pipe.c:1478

Regards,
Stefan Farfeleder


Index: sys/eventvar.h
===
RCS file: /home/ncvs/src/sys/sys/eventvar.h,v
retrieving revision 1.4
diff -u -r1.4 eventvar.h
--- sys/eventvar.h  18 Jul 2000 19:31:48 -  1.4
+++ sys/eventvar.h  8 Oct 2002 16:06:46 -
@@ -35,6 +35,7 @@
 struct kqueue {
TAILQ_HEAD(kqlist, knote) kq_head;  /* list of pending event */
int kq_count;   /* number of pending events */
+   struct  mtx kq_mtx; /* protect kq_head */
struct  selinfo kq_sel; 
struct  filedesc *kq_fdp;
int kq_state;
Index: kern/kern_event.c
===
RCS file: /home/ncvs/src/sys/kern/kern_event.c,v
retrieving revision 1.46
diff -u -r1.46 kern_event.c
--- kern/kern_event.c   3 Oct 2002 06:03:26 -   1.46
+++ kern/kern_event.c   8 Oct 2002 19:22:27 -
@@ -375,7 +375,7 @@
if (error)
goto done2;
kq = malloc(sizeof(struct kqueue), M_KQUEUE, M_WAITOK | M_ZERO);
-   TAILQ_INIT(&kq->kq_head);
+   mtx_init(&kq->kq_mtx, "kqueue mutex", NULL, MTX_DEF);
FILE_LOCK(fp);
fp->f_flag = FREAD | FWRITE;
fp->f_type = DTYPE_KQUEUE;
@@ -709,13 +709,15 @@
error = 0;
goto done;
}
+   splx(s);
 
+   mtx_lock(&kq->kq_mtx);
TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); 
while (count) {
kn = TAILQ_FIRST(&kq->kq_head);
TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); 
if (kn == &marker) {
-   splx(s);
+   mtx_unlock(&kq->kq_mtx);
if (count == maxevents)
goto retry;
goto done;
@@ -737,10 +739,10 @@
if (kn->kn_flags & EV_ONESHOT) {
kn->kn_status &= ~KN_QUEUED;
kq->kq_count--;
-   splx(s);
+   mtx_unlock(&kq->kq_mtx);
kn->kn_fop->f_detach(kn);
knote_drop(kn, td);
-   s = splhigh();
+   mtx_lock(&kq->kq_mtx);
} else if (kn->kn_flags & EV_CLEAR) {
kn->kn_data = 0;
kn->kn_fflags = 0;
@@ -751,19 +753,19 @@
}
count--;
if (nkev == KQ_NEVENTS) {
-   splx(s);
+   mtx_unlock(&kq->kq_mtx);
error = copyout(&kq->kq_kev, ulistp,
sizeof(struct kevent) * nkev);
ulistp += nkev;
nkev = 0;
kevp = kq->kq_kev;
-   s = splhigh();
+   mtx_lock(&kq->kq_mtx);
if (error)
break;
}
}
TAILQ_REMOVE(&kq->kq_head, &marker, kn_tqe); 
-   splx(s);
+   mtx_unlock(&kq->kq_mtx);
 done:
if (nkev != 0)
error = copyout(&kq->kq_kev, ulistp,
@@ -887,6 +889,7 @@
}
}
FILEDESC_UNLOCK(fdp);
+   mtx_destroy(&kq->kq_mtx);
free(kq, M_KQUEUE);
fp->f_data = NULL;
 
@@ -1051,14 +1054,14 @@
 knote_enqueue(struct knote *kn)
 {
struct kqueue *kq = kn->kn_kq;
-   int s = splhigh();
 
KASSERT((kn->kn_status & KN_QUEUED) == 0, ("knote already queued"));
 
+   mtx_lock(&kq->kq_mtx);
TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); 
kn->kn_status |= KN_QUEUED;
kq->kq_count++;
-   splx(s);
+   mtx_unlock(&kq->kq_mtx);
kqueue_wakeup(kq);
 }
 
@@ -1066,14 +1069,14 @@
 knote_dequeue(struct knote *kn)
 {
struct kqueue *kq = kn->kn_kq;
-   int s = splhigh();
 
KASSERT(kn->kn_status & KN_QUEUED, ("knote not queued"));
 
+   mtx_lock(&kq->kq_mtx);
T

Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-07 Thread Julian Elischer



On Mon, 7 Oct 2002, Terry Lambert wrote:

> Stefan Farfeleder wrote:
> > 
> > I'm confused why marker - if it was removed by TAILQ_REMOVE - hasn't
> > kn_tqe.tqe_next and kn_tqe.tqe_prev set to (void *)-1.

because that only happens if the debug code in queue.h is enabled, which
it is not..


> 
> OK, what this means is that the marker queue entry was removed
> by something else going in there.
> 
> THis shouldn't happen.
> 
> Try adding this before the initialization of the marker data:
> 
>   bzero(&marker, sizeof(marker));
> 
> That should keep it from matching any removal criteria.  THe only
> way this could keep crashing after this mod is if the queue is
> being destroyed out from under you.
> 
> The implication here is that the queue should be protected by the
> object lock for the object for which the pointer to the queue
> instance is an element.
> 
> Fixing this would be very hard (IMO).
> 
> The next step (assuming it still panics) is to add:
> 
>   #define KQ_FREE 0x80
> 
> ...and set it into kq_state on a kqueue that has been freed and/or
> deallocated somewhere (then check to see if it's set, after the
> panic).  Ugly, but it will tell you whether or not that's what's
> happening (scanning a dead queue).
> 
> The worst case is scanning a dead queue quose memory has been
> reused for some other purpose.  8-(.
> 
> I can't personally repeat the problem, so you're elected to do
> the legwork on this one.  8-(.
> 
> -- Terry
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-07 Thread Terry Lambert

Stefan Farfeleder wrote:
> On Sun, Oct 06, 2002 at 11:14:26PM -0700, Terry Lambert wrote:
> > Stefan: Did the patch fix it, or not?
> 
> Sorry for the long delay. No, it did not. But I now have a rather
> interesting core dump. I inserted a KASSERT, so that the code looks like
> this:
> 
> TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe);
> while (count) {
> kn = TAILQ_FIRST(&kq->kq_head);
> KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL"));

[ ... ]

> panic: bremfree: bp 0xd2adf6f0 not locked

Second panic, during debugger sync.

> panic: TAILQ_FIRST returned NULL

See below...

> panic: from debugger

You, manually calling "panic" inside the debugger...

> syncing disks... panic: bremfree: bp 0xd2adf6f0 not locked

The second panic (again).

> #2  0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508

2nd panic.

> #10 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508

Manual panic (no arguments).

> #18 0xc01babcf in panic (fmt=0x0)
> 
> at /freebsd/current/src/sys/kern/kern_shutdown.c:494
> 
> #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90,
> 
> tsp=0xc754f828, td=0xc6426b60)
> 
> at /freebsd/current/src/sys/kern/kern_event.c:717

*** OK, it's very hard to believe you didn't break into the
*** debugger and manually call "pnaic" to get this to happen.

Why?  Because the "fmt" string is 0x0, which indicates that you
called the panic manually, instead of being the address of the
string "TAILQ_FIRST returned NULL", like you'd expect.


> #19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90,
> 
> tsp=0xc754f828, td=0xc6426b60)
> 
> at /freebsd/current/src/sys/kern/kern_event.c:717
> 
> 717 KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL"));
> 
> (kgdb) info locals
> 
> kq = (struct kqueue *) 0xc754f800
> 
> kevp = (struct kevent *) 0xc754f828
> 
> atv = {tv_sec = 0, tv_usec = 0}
> 
> rtv = {tv_sec = 434, tv_usec = -1070420864}
> 
> ttv = {tv_sec = 1, tv_usec = -1070411616}
> 
> kn = (struct knote *) 0x0
> 
> marker = {kn_link = {sle_next = 0xc01b0d37}, kn_selnext = {
> 
> sle_next = 0xc0368a20}, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc6650ac8},
> 
>   kn_kq = 0xc6426bcc, kn_kevent = {ident = 3344374324, filter = -30080,
> 
> flags = 49206, fflags = 3224546432, data = 431, udata = 0xe2c9dca0},
> 
>   kn_status = 16, kn_sfflags = -1070167424, kn_sdata = 8, kn_ptr = {
> 
> p_fp = 0xc032ac80, p_proc = 0xc032ac80}, kn_fop = 0x1af, kn_hook = 0x3}
> 
> count = 4
> 
> timeout = 0
> 
> nkev = 0
> 
> error = 0
> 
> (kgdb) p *kq
> 
> $2 = {kq_head = {tqh_first = 0x0, tqh_last = 0xc754f800}, kq_count = 1,
> 
>   kq_sel = {si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0,
> 
> si_note = {slh_first = 0x0}, si_flags = 0}, kq_fdp = 0xc7571a00,
> 
>   kq_state = 0, kq_kev = {{ident = 23, filter = -1, flags = 1, fflags = 0,
> 
>   data = 69, udata = 0x80cd800}, {ident = 23, filter = -1, flags = 1,
> 
>   fflags = 0, data = 164, udata = 0x80cd800}, {ident = 27, filter = -1,
> 
>   flags = 1, fflags = 0, data = 218, udata = 0x80cf800}, {ident = 19,
> 
>   filter = -1, flags = 1, fflags = 0, data = 182, udata = 0x80cc800}, {
> 
>   ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, {
> 
>   ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, {
> 
>   ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}, {
> 
>   ident = 0, filter = 0, flags = 0, fflags = 0, data = 0, udata = 0x0}}}
> 
> (kgdb) q
> 
> frog# ^Dexit
> 
> Script done on Mon Oct  7 11:32:50 2002
> 
> I'm confused why marker - if it was removed by TAILQ_REMOVE - hasn't
> kn_tqe.tqe_next and kn_tqe.tqe_prev set to (void *)-1.

OK, what this means is that the marker queue entry was removed
by something else going in there.

THis shouldn't happen.

Try adding this before the initialization of the marker data:

bzero(&marker, sizeof(marker));

That should keep it from matching any removal criteria.  THe only
way this could keep crashing after this mod is if the queue is
being destroyed out from under you.

The implication here is that the queue should be protected by the
object lock for the object for which the pointer to the queue
instance is an element.

Fixing this would be very hard (IMO).

The next step (assuming it still panics) is to add:

#define KQ_FREE 0x80

...and set it into kq_state on a kqueue that has been freed and/or
deallocated somewhere (then check to see if it's set, after the
panic).  Ugly, but it will tell you whether or not that's what's
happening (scanning a dead queue).

The worst case is scanning a dead queue quose memory has been
reused for some other purpose.  8-(.

I can't personally repeat the problem, so you're elected to do
the legwork on this one.  8-(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body

Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-07 Thread Stefan Farfeleder

On Sun, Oct 06, 2002 at 11:14:26PM -0700, Terry Lambert wrote:
> 
> Stefan: Did the patch fix it, or not?

Sorry for the long delay. No, it did not. But I now have a rather
interesting core dump. I inserted a KASSERT, so that the code looks like
this:

TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe);
while (count) {
kn = TAILQ_FIRST(&kq->kq_head);
KASSERT(kn != NULL, ("TAILQ_FIRST returned NULL"));
/*
 * Skip over all markers which are not ours.  This looks
 * unsafe, but we can't hit the end of the list without
 * hitting our own marker.
 */
while ((kn->kn_status & KN_MARKER) && (kn != &marker)) {
kn = TAILQ_NEXT(kn, kn_tqe);
}
TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
if (kn == &marker) {
[...]

Script started on Mon Oct  7 11:26:10 2002
frog# ../bin/gdb -k crash/kernel.debug.3 crash/vmcore.3

GNU gdb 5.2.0 (FreeBSD) 20020627
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: bremfree: bp 0xd2adf6f0 not locked
panic messages:
---
panic: TAILQ_FIRST returned NULL
cpuid = 1; lapic.id = 0100
panic: from debugger
cpuid = 1; lapic.id = 0100
boot() called on cpu#1

syncing disks... panic: bremfree: bp 0xd2adf6f0 not locked
cpuid = 1; lapic.id = 0100
boot() called on cpu#1
Uptime: 13m27s
pfs_vncache_unload(): 1 entries remaining
Dumping 1023 MB
ata0: resetting devices ..
done
ad0: timeout sending command=c5 s=d0 e=00
ad0: error executing commandata0: resetting devices ..
done
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 
384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 
720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008
---
#0  doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223
223 dumping++;
(kgdb) bt
#0  doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223
#1  0xc01ba92a in boot (howto=260)
at /freebsd/current/src/sys/kern/kern_shutdown.c:355
#2  0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508
#3  0xc01fcc77 in bremfree (bp=0xd2adf6f0)
at /freebsd/current/src/sys/kern/vfs_bio.c:632
#4  0xc01fe798 in vfs_bio_awrite (bp=0x3)
at /freebsd/current/src/sys/kern/vfs_bio.c:1633
#5  0xc02a7afa in ffs_fsync (ap=0xe2c9d8fc)
at /freebsd/current/src/sys/ufs/ffs/ffs_vnops.c:252
#6  0xc02a7829 in VOP_FSYNC (vp=0x0, cred=0x0, waitfor=0, td=0x0)
at vnode_if.h:612
#7  0xc02a6d3b in ffs_sync (mp=0xc642ba00, waitfor=2, cred=0xc22b2e80, 
td=0xc03643a0) at /freebsd/current/src/sys/ufs/ffs/ffs_vfsops.c:1127
#8  0xc0210998 in sync (td=0xc03643a0, uap=0x0)
at /freebsd/current/src/sys/kern/vfs_syscalls.c:130
#9  0xc01ba52b in boot (howto=256)
at /freebsd/current/src/sys/kern/kern_shutdown.c:264
#10 0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508
#11 0xc013b1d2 in db_panic () at /freebsd/current/src/sys/ddb/db_command.c:450
#12 0xc013b152 in db_command (last_cmdp=0xc035db40, cmd_table=0x0, 
aux_cmd_tablep=0xc03577fc, aux_cmd_tablep_end=0xc0357800)
at /freebsd/current/src/sys/ddb/db_command.c:346
---Type  to continue, or q  to quit---
#13 0xc013b266 in db_command_loop ()
at /freebsd/current/src/sys/ddb/db_command.c:472
#14 0xc013deca in db_trap (type=3, code=0)
at /freebsd/current/src/sys/ddb/db_trap.c:72
#15 0xc02e9f60 in kdb_trap (type=3, code=0, regs=0xe2c9db94)
at /freebsd/current/src/sys/i386/i386/db_interface.c:166
#16 0xc0302027 in trap (frame=
  {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -968725664, tf_esi = 256, tf_ebp = 
-490087456, tf_isp = -490087488, tf_ebx = 0, tf_edx = 0, tf_ecx = 32, tf_eax = 18, 
tf_trapno = 3, tf_err = 0, tf_eip = -1070685611, tf_cs = 8, tf_eflags = 658, tf_esp = 
-1070272669, tf_ss = -1070406694})
at /freebsd/current/src/sys/i386/i386/trap.c:605
#17 0xc02eb768 in calltrap () at {standard input}:99
#18 0xc01babcf in panic (fmt=0x0)
at /freebsd/current/src/sys/kern/kern_shutdown.c:494
#19 0xc01a1212 in kqueue_scan (fp=0x0, maxevents=4, ulistp=0xbfbfeb90, 
tsp=0xc754f828, td=0xc6426b60)
at /freebsd/current/src/sys/kern/kern_event.c:717
#20 0xc01a0ad1 in kevent (td=0xc6426b60, uap=0xe2c9dd10)
at /freebsd/current/src/sys/kern/kern_event.c:470
#21 0xc030299e in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077937792, tf_esi = 4, tf_ebp = 
-1077941256, tf_isp = -490087052, tf_ebx = -1077937772, tf_edx = 2184, tf_---Type 
 to continue, or q  to quit---
ecx = 0, tf_eax = 363, tf_trapno = 0, tf_err =

Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-06 Thread Terry Lambert

Terry Lambert wrote:
> Stefan Farfeleder wrote:
> > (kgdb) l *kqueue_scan+0x242
> > 0xc01a1212 is in kqueue_scan
> > (/freebsd/current/src/sys/kern/kern_event.c:716).
> > 713 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe);
> > 714 while (count) {
> > 715 kn = TAILQ_FIRST(&kq->kq_head);
> > translates to:  mov(%edi),%ebx
> > 716 TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
> > translates to:  cmpl   $0x0,0x8(%ebx)
> >
> > This line causes the page fault because %ebx is 0.
[ ... ]
> Please try the attached patch.
> 
> -- Terry


Stefan: Did the patch fix it, or not?

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



[PATCH] Re: Junior Kernel Hacker page updated...

2002-10-05 Thread Terry Lambert

Stefan Farfeleder wrote:
> (kgdb) l *kqueue_scan+0x242
> 0xc01a1212 is in kqueue_scan
> (/freebsd/current/src/sys/kern/kern_event.c:716).
> 713 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe);
> 714 while (count) {
> 715 kn = TAILQ_FIRST(&kq->kq_head);
> translates to:  mov(%edi),%ebx
> 716 TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
> translates to:  cmpl   $0x0,0x8(%ebx)
> 
> This line causes the page fault because %ebx is 0.


This can't happen, at least from an "empty list" perspective,
even if kqueue_scan() is reentered, since the "marker" is an
auto allocation on the stack, and a different stack means a
different marker gets inserted (marker isn't static, so having
more than one insert of the marker won't result in only a
single insertion).

I suspect that what is hapening is that the code is being
reentered, and one marker is being treated as an event, because
of whatever garbage happens to be on the stack in the allocated
marker.

The marker is removed, and then it is not found before you hit
the end of the list.

Please try the attached patch.

-- Terry

Index: sys/event.h
===
RCS file: /cvs/src/sys/sys/event.h,v
retrieving revision 1.21
diff -c -r1.21 event.h
*** sys/event.h 29 Jun 2002 19:14:52 -  1.21
--- sys/event.h 5 Oct 2002 15:12:24 -
***
*** 160,165 
--- 160,166 
  #define KN_QUEUED 0x02/* event is on queue */
  #define KN_DISABLED   0x04/* event is disabled */
  #define KN_DETACHED   0x08/* knote is detached */
+ #define KN_MARKER 0x10/* knote is a scan marker */
  
  #define kn_id kn_kevent.ident
  #define kn_filter kn_kevent.filter
Index: kern/kern_event.c
===
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.45
diff -c -r1.45 kern_event.c
*** kern/kern_event.c   17 Aug 2002 02:36:16 -  1.45
--- kern/kern_event.c   5 Oct 2002 15:13:26 -
***
*** 653,658 
--- 653,659 
  
FILE_LOCK_ASSERT(fp, MA_NOTOWNED);
  
+   marker.kn_status = KN_MARKER;
kq = (struct kqueue *)fp->f_data;
count = maxevents;
if (count == 0)
***
*** 713,718 
--- 714,727 
TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); 
while (count) {
kn = TAILQ_FIRST(&kq->kq_head);
+   /*
+* Skip over all markers which are not ours.  This looks
+* unsafe, but we can't hit the end of the list without
+* hitting our own marker.
+*/
+   while ((kn->kn_status & KN_MARKER) && (kn != &marker)) {
+   kn = TAILQ_NEXT(kn, kn_tqe);
+   }
TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); 
if (kn == &marker) {
splx(s);



Re: Junior Kernel Hacker page updated...

2002-10-05 Thread Stefan Farfeleder

On Fri, Oct 04, 2002 at 04:33:17PM -0400, John Baldwin wrote:

I wrote:
> >> Fatal trap 12: page fault while in kernel mode
> >> cpuid = 0; lapic.id = 
> >> fault virtual address   = 0x8
> >> fault code  = supervisor read, page not present
> >> instruction pointer = 0x8:0xc01a1212
> >> stack pointer   = 0x10:0xe5226c14
> >> frame pointer   = 0x10:0xe5226ca0
> >> code segment= base 0x0, limit 0xf, type 0x1b
> >> = DPL 0, pres 1, def32 1, gran 1
> >> processor eflags= interrupt enabled, resume, IOPL = 0
> >> current process = 56525 (make)
> >> 
> >> kernel: type 12 trap, code = 0
> >> 
> >> Stopped atkqueue_scan+0x242:  cmpl $0,0x8(%ebx)
> >> db> trace
> >> kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242
> >> kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1
> >> syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be
> >> %%%

> Even better, pop up gdb on kernel.debug and do
> 'l *kqueue_scan+0x242' to look at the offending line of code.
> addr2line can also be useful here similarly.

(kgdb) l *kqueue_scan+0x242
0xc01a1212 is in kqueue_scan
(/freebsd/current/src/sys/kern/kern_event.c:716).
711 }
712
713 TAILQ_INSERT_TAIL(&kq->kq_head, &marker, kn_tqe); 
714 while (count) {
715 kn = TAILQ_FIRST(&kq->kq_head);

translates to:  mov(%edi),%ebx

716 TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); 

translates to:  cmpl   $0x0,0x8(%ebx)

This line causes the page fault because %ebx is 0.

je fe3 
mov0x8(%ebx),%edx
[...]

717 if (kn == &marker) {
718 splx(s);
719 if (count == maxevents)
720 goto retry;

I've added this after line 715:

716 if (kn == NULL) {
717 Debugger("TAILQ_FIRST returns NULL");
718 }

and after 4 freezes, I really came into ddb in line 717. However, when
trying to produce a dump, this occured:

db> panic
panic: from debugger
cpuid = 1; lapic.id = 0100
boot() called on cpu#1

syncing disks... panic: bremfree: bp 0xd2a42990 not locked
boot() called on cpu#1
Uptime: 10m13s
pfs_vncache_unload(): 1 entries remaining
Dumping 1023 MB
ata0: resetting devices
ata0: mask=03 ostat0=50 ostat2=00
ad0: ATAPI 00 00
ata0-slave: ATAPI 00 00
ata0: mask=03 stat0=50 stat1=00
ad0: ATA 01 a5
ata0: devices=01

and I had to reboot without a dump :-(

Regards,
Stefan Farfeleder

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-04 Thread Juli Mallett

* De: Poul-Henning Kamp <[EMAIL PROTECTED]> [ Data: 2002-10-04 ]
[ Subjecte: Re: Junior Kernel Hacker page updated... ]
> In message <[EMAIL PROTECTED]>, Stefan Farfeleder writes:
> >
> 
> I have committed your patch, but left it #undef'ed until we get
> the kernel straightened out.
> 
> On behalf of the FreeBSD project I have to warn you that if you
> persist in doing work of this kind over and over again, you will
> eventually be subjected to a commit bit.  You have been warned :-)
> 
> Thank you very much for you work! :-)
> 
> Poul-Henning
> 
> >The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH.
> >Is anybody using them? Building with -DREMOTE doesn't compile and with
> >-DRMT_WILL_WATCH the linker is complaining about the lack of the
> >functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of
> >those defines? I understand Juli Mallett wants to rewrite make, so maybe
> >this effort would be wasted.
> 
> I belive the RMT/REMOTE stuff is part of an earlier attempt at putting
> in some kind of cluster functionality.
> 
> If it is useful in any capacity, even as hint of what/how to do such
> a thing, I think we should leave it.  If it is just old junk we
> should boot it.

NetBSD has it working.  I don't recall anymore if the diffs were enough
to dissuade me from making ours work, or if I just had no interest.

It's gone from my local tree simply because I got tired of grepping for
things to debug, and running into ifdef's cruft.
-- 
Juli Mallett <[EMAIL PROTECTED]>   | FreeBSD: The Power To Serve
Will break world for fulltime employment. | finger [EMAIL PROTECTED]
http://people.FreeBSD.org/~jmallett/  | Support my FreeBSD hacking!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-04 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Stefan Farfeleder writes:
>

I have committed your patch, but left it #undef'ed until we get
the kernel straightened out.

On behalf of the FreeBSD project I have to warn you that if you
persist in doing work of this kind over and over again, you will
eventually be subjected to a commit bit.  You have been warned :-)

Thank you very much for you work! :-)

Poul-Henning

>The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH.
>Is anybody using them? Building with -DREMOTE doesn't compile and with
>-DRMT_WILL_WATCH the linker is complaining about the lack of the
>functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of
>those defines? I understand Juli Mallett wants to rewrite make, so maybe
>this effort would be wasted.

I belive the RMT/REMOTE stuff is part of an earlier attempt at putting
in some kind of cluster functionality.

If it is useful in any capacity, even as hint of what/how to do such
a thing, I think we should leave it.  If it is just old junk we
should boot it.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-04 Thread John Baldwin


On 04-Oct-2002 Juli Mallett wrote:
> * De: Stefan Farfeleder <[EMAIL PROTECTED]> [ Data: 2002-10-04 ]
>   [ Subjecte: Re: Junior Kernel Hacker page updated... ]
>> On Thu, Oct 03, 2002 at 04:41:46PM +0200, Poul-Henning Kamp wrote:
>> > 
>> > Hi Stefan,
>> > 
>> > I tried this patch and it paniced my (almost-) current machine with
>> > a pagefault in the kqueue code:  Bravo!
>> > 
>> > I can see that there is some amount of #ifdef stuff in your patch,
>> 
>> The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH.
>> Is anybody using them? Building with -DREMOTE doesn't compile and with
>> -DRMT_WILL_WATCH the linker is complaining about the lack of the
>> functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of
>> those defines? I understand Juli Mallett wants to rewrite make, so maybe
>> this effort would be wasted.
>> 
>> > in light of that, would it be possible to make an #ifdef'ed version
>> > of your patch which we could commit ?
>> 
>> Ok, the new patch is attached. Compile with -DUSE_KQUEUE to use the new
>> code.
>> 
>> > That way we give the kqueue hackers a good testcase, and we can
>> > easily enable when they have solved the problem.
>> 
>> After Don Lewis fixed the 'could sleep with' problem (thanks!), I'm
>> still encountering freezes and panics. Here's one I caught:
>> 
>> [warning: parts are typed in]
>> %%%
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; lapic.id = 
>> fault virtual address   = 0x8
>> fault code  = supervisor read, page not present
>> instruction pointer = 0x8:0xc01a1212
>> stack pointer   = 0x10:0xe5226c14
>> frame pointer   = 0x10:0xe5226ca0
>> code segment= base 0x0, limit 0xf, type 0x1b
>> = DPL 0, pres 1, def32 1, gran 1
>> processor eflags= interrupt enabled, resume, IOPL = 0
>> current process = 56525 (make)
>> 
>> kernel: type 12 trap, code = 0
>> 
>> Stopped atkqueue_scan+0x242:  cmpl $0,0x8(%ebx)
>> db> trace
>> kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242
>> kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1
>> syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be
>> %%%
> 
> Run the kqueue source file through gcc with -fverbose-asm -S and then
> look at the resulting .s file, grep for cmpl.*0x8( and look for
> what's being dereferenced without being checked for NULL.

Even better, pop up gdb on kernel.debug and do
'l *kqueue_scan+0x242' to look at the offending line of code.
addr2line can also be useful here similarly.

-- 

John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-04 Thread Juli Mallett

* De: Stefan Farfeleder <[EMAIL PROTECTED]> [ Data: 2002-10-04 ]
[ Subjecte: Re: Junior Kernel Hacker page updated... ]
> On Thu, Oct 03, 2002 at 04:41:46PM +0200, Poul-Henning Kamp wrote:
> > 
> > Hi Stefan,
> > 
> > I tried this patch and it paniced my (almost-) current machine with
> > a pagefault in the kqueue code:  Bravo!
> > 
> > I can see that there is some amount of #ifdef stuff in your patch,
> 
> The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH.
> Is anybody using them? Building with -DREMOTE doesn't compile and with
> -DRMT_WILL_WATCH the linker is complaining about the lack of the
> functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of
> those defines? I understand Juli Mallett wants to rewrite make, so maybe
> this effort would be wasted.
> 
> > in light of that, would it be possible to make an #ifdef'ed version
> > of your patch which we could commit ?
> 
> Ok, the new patch is attached. Compile with -DUSE_KQUEUE to use the new
> code.
> 
> > That way we give the kqueue hackers a good testcase, and we can
> > easily enable when they have solved the problem.
> 
> After Don Lewis fixed the 'could sleep with' problem (thanks!), I'm
> still encountering freezes and panics. Here's one I caught:
> 
> [warning: parts are typed in]
> %%%
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; lapic.id = 
> fault virtual address   = 0x8
> fault code  = supervisor read, page not present
> instruction pointer = 0x8:0xc01a1212
> stack pointer   = 0x10:0xe5226c14
> frame pointer   = 0x10:0xe5226ca0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 56525 (make)
> 
> kernel: type 12 trap, code = 0
> 
> Stopped atkqueue_scan+0x242:  cmpl $0,0x8(%ebx)
> db> trace
> kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242
> kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1
> syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be
> %%%

Run the kqueue source file through gcc with -fverbose-asm -S and then
look at the resulting .s file, grep for cmpl.*0x8( and look for
what's being dereferenced without being checked for NULL.

> The core file doesn't seem to be particularly helpful, as the
> kqueue_scan frame seems to miss:
> 
> %%%
> (kgdb) bt
> #0  doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223
> #1  0xc01ba92a in boot (howto=260)
> at /freebsd/current/src/sys/kern/kern_shutdown.c:355
> #2  0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508
> #3  0xc013b1d2 in db_panic () at /freebsd/current/src/sys/ddb/db_command.c:450
> #4  0xc013b152 in db_command (last_cmdp=0xc035db20, cmd_table=0x0, 
> aux_cmd_tablep=0xc03577dc, aux_cmd_tablep_end=0xc03577e0)
> at /freebsd/current/src/sys/ddb/db_command.c:346
> #5  0xc013b266 in db_command_loop ()
> at /freebsd/current/src/sys/ddb/db_command.c:472
> #6  0xc013deca in db_trap (type=12, code=0)
> at /freebsd/current/src/sys/ddb/db_trap.c:72
> #7  0xc02e9f60 in kdb_trap (type=12, code=0, regs=0xe5226bd4)
> at /freebsd/current/src/sys/i386/i386/db_interface.c:166
> #8  0xc0302602 in trap_fatal (frame=0xe5226bd4, eva=0)
> at /freebsd/current/src/sys/i386/i386/trap.c:841
> #9  0xc03022e2 in trap_pfault (frame=0xe5226bd4, usermode=0, eva=8)
> at /freebsd/current/src/sys/i386/i386/trap.c:760
> #10 0xc0301e0d in trap (frame=
>   {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -943606528, tf_esi = -943606488, 
>tf_ebp = -450728800, tf_isp = -450728960, tf_ebx = 0, tf_edx = 4, tf_ecx = 1, tf_eax 
>= 0, tf_trapno = 12, tf_err = 0, tf_eip = -1072033262, tf_cs = 8, tf_eflags = 66050, 
>tf_esp = -966356416, tf_ss = 0})
> ---Type  to continue, or q  to quit---
> at /freebsd/current/src/sys/i386/i386/trap.c:446
> #11 0xc02eb768 in calltrap () at {standard input}:99
> #12 0xc01a0ad1 in kevent (td=0xc70ecea0, uap=0xe5226d10)
> at /freebsd/current/src/sys/kern/kern_event.c:470
> #13 0xc030299e in syscall (frame=
>   {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135845760, tf_esi = 135846240, 
>tf_ebp = -1077941224, tf_isp = -450728588, tf_ebx = 134831872, tf_edx = 2184, tf_ecx 
>= 0, tf_eax = 363, tf_trapno = 12, tf_err = 2, tf_eip = 134626551, tf_cs = 31, 
>tf_eflags = 514, tf_esp = -1077941348, tf_ss = 47})
> at /freebsd/current/src/sys/i386/i386/trap.c:1050
> #14 0xc02eb7bd in Xint0x80_syscall () at {standard input}:141
> ---Can't read userspace from dump, or kernel p

Re: Junior Kernel Hacker page updated...

2002-10-04 Thread Stefan Farfeleder

On Thu, Oct 03, 2002 at 04:41:46PM +0200, Poul-Henning Kamp wrote:
> 
> Hi Stefan,
> 
> I tried this patch and it paniced my (almost-) current machine with
> a pagefault in the kqueue code:  Bravo!
> 
> I can see that there is some amount of #ifdef stuff in your patch,

The #ifdefs are already in the code, namely REMOTE and RMT_WILL_WATCH.
Is anybody using them? Building with -DREMOTE doesn't compile and with
-DRMT_WILL_WATCH the linker is complaining about the lack of the
functions Rmt_Ignore(), Rmt_Watch() and Rmt_Wait(). Can't we get rid of
those defines? I understand Juli Mallett wants to rewrite make, so maybe
this effort would be wasted.

> in light of that, would it be possible to make an #ifdef'ed version
> of your patch which we could commit ?

Ok, the new patch is attached. Compile with -DUSE_KQUEUE to use the new
code.

> That way we give the kqueue hackers a good testcase, and we can
> easily enable when they have solved the problem.

After Don Lewis fixed the 'could sleep with' problem (thanks!), I'm
still encountering freezes and panics. Here's one I caught:

[warning: parts are typed in]
%%%
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic.id = 
fault virtual address   = 0x8
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc01a1212
stack pointer   = 0x10:0xe5226c14
frame pointer   = 0x10:0xe5226ca0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 56525 (make)

kernel: type 12 trap, code = 0

Stopped atkqueue_scan+0x242:  cmpl $0,0x8(%ebx)
db> trace
kqueue_scan(c6472bf4,4,bfbfebc0,0,c70ecea0) at kqueue_scan+0x242
kevent(c70ecea0,e5226d10,c0351d80,418,6) at kevent+0x1e1
syscall(2f,2f,2f,818d780,818d960) at syscall+0x2be
%%%

The core file doesn't seem to be particularly helpful, as the
kqueue_scan frame seems to miss:

%%%
(kgdb) bt
#0  doadump () at /freebsd/current/src/sys/kern/kern_shutdown.c:223
#1  0xc01ba92a in boot (howto=260)
at /freebsd/current/src/sys/kern/kern_shutdown.c:355
#2  0xc01babe7 in panic () at /freebsd/current/src/sys/kern/kern_shutdown.c:508
#3  0xc013b1d2 in db_panic () at /freebsd/current/src/sys/ddb/db_command.c:450
#4  0xc013b152 in db_command (last_cmdp=0xc035db20, cmd_table=0x0, 
aux_cmd_tablep=0xc03577dc, aux_cmd_tablep_end=0xc03577e0)
at /freebsd/current/src/sys/ddb/db_command.c:346
#5  0xc013b266 in db_command_loop ()
at /freebsd/current/src/sys/ddb/db_command.c:472
#6  0xc013deca in db_trap (type=12, code=0)
at /freebsd/current/src/sys/ddb/db_trap.c:72
#7  0xc02e9f60 in kdb_trap (type=12, code=0, regs=0xe5226bd4)
at /freebsd/current/src/sys/i386/i386/db_interface.c:166
#8  0xc0302602 in trap_fatal (frame=0xe5226bd4, eva=0)
at /freebsd/current/src/sys/i386/i386/trap.c:841
#9  0xc03022e2 in trap_pfault (frame=0xe5226bd4, usermode=0, eva=8)
at /freebsd/current/src/sys/i386/i386/trap.c:760
#10 0xc0301e0d in trap (frame=
  {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -943606528, tf_esi = -943606488, 
tf_ebp = -450728800, tf_isp = -450728960, tf_ebx = 0, tf_edx = 4, tf_ecx = 1, tf_eax = 
0, tf_trapno = 12, tf_err = 0, tf_eip = -1072033262, tf_cs = 8, tf_eflags = 66050, 
tf_esp = -966356416, tf_ss = 0})
---Type  to continue, or q  to quit---
at /freebsd/current/src/sys/i386/i386/trap.c:446
#11 0xc02eb768 in calltrap () at {standard input}:99
#12 0xc01a0ad1 in kevent (td=0xc70ecea0, uap=0xe5226d10)
at /freebsd/current/src/sys/kern/kern_event.c:470
#13 0xc030299e in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135845760, tf_esi = 135846240, 
tf_ebp = -1077941224, tf_isp = -450728588, tf_ebx = 134831872, tf_edx = 2184, tf_ecx = 
0, tf_eax = 363, tf_trapno = 12, tf_err = 2, tf_eip = 134626551, tf_cs = 31, tf_eflags 
= 514, tf_esp = -1077941348, tf_ss = 47})
at /freebsd/current/src/sys/i386/i386/trap.c:1050
#14 0xc02eb7bd in Xint0x80_syscall () at {standard input}:141
---Can't read userspace from dump, or kernel process---
%%%

objdump -dS kern_event.o says the following about kqueue_scan+0x242
[==0xfd2]:

%%%
 fce:   90  nop
 fcf:   90  nop
kn = TAILQ_FIRST(&kq->kq_head);
 fd0:   8b 1f   mov(%edi),%ebx
TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); 
 fd2:   83 7b 08 00 cmpl   $0x0,0x8(%ebx)
 fd6:   74 0b   je fe3 
 fd8:   8b 53 08mov0x8(%ebx),%edx
 fdb:   8b 43 0cmov0xc(%ebx),%eax
 fde:   89 42 0cmov%eax,0xc(%edx)
 fe1:   eb 06   jmpfe9 
 fe3:   8b 43 0cmov0xc(%ebx),%eax
 fe6:   89 47 04mov%eax,0x4(%edi)
 fe9:   8b 43 0c   

Re: Junior Kernel Hacker page updated...

2002-10-03 Thread Poul-Henning Kamp


Hi Stefan,

I tried this patch and it paniced my (almost-) current machine with
a pagefault in the kqueue code:  Bravo!

I can see that there is some amount of #ifdef stuff in your patch,
in light of that, would it be possible to make an #ifdef'ed version
of your patch which we could commit ?

That way we give the kqueue hackers a good testcase, and we can
easily enable when they have solved the problem.

Poul-Henning



In message <[EMAIL PROTECTED]>, Stefan Farfeleder writes:
>
>--VS++wcV0S1rZb1Fb
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: inline
>
>On Sat, Sep 14, 2002 at 12:17:53PM +0200, Poul-Henning Kamp wrote:
>> 
>> This is just to note that I have updated the JKH page with a lot of new
>> stuff, so if your coding-pencil itches:
>> 
>>  http://people.freebsd.org/~phk/TODO/
>
>|Make -j improvement
>|
>|make(1) with -j option uses a select loop to wait for events, and every
>|100msec it drops out to look for processes exited etc.  A pure "make
>|buildworld" on a single-CPU machine is up to 25% faster that the best
>|"make -j N buildworld" time on the same hardware.  Changing to timeout
>|to be 10msec improves things about 10%.
>|I think that make(1) should use kqueue(2) instead, since that would
>|eliminate the need for timeouts.
>
>Ok, here's what I came up with.  However, with the patch applied, each
>'make buildworld' on a SMP machine throws tons of
>
>/freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" 
>locked from /freebsd/current/src/sys/kern/kern_event.c:959
>
>at me and freezes badly at some point (no breaking into ddb possible).
>This is totally repeatable.  Is anybody able to reproduce (and maybe
>fix) this?
>
>Regards,
>Stefan Farfeleder
>
>--VS++wcV0S1rZb1Fb
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: attachment; filename="make.diff"
>
>Index: job.c
>===
>RCS file: /home/ncvs/src/usr.bin/make/job.c,v
>retrieving revision 1.43
>diff -u -r1.43 job.c
>--- job.c  29 Sep 2002 00:20:28 -  1.43
>+++ job.c  2 Oct 2002 21:34:51 -
>@@ -64,9 +64,8 @@
>  *Job_CatchOutput Print any output our children have produced.
>  *Should also be called fairly frequently to
>  *keep the user informed of what's going on.
>- *If no output is waiting, it will block for
>- *a time given by the SEL_* constants, below,
>- *or until output is ready.
>+ *If no output is waiting, it will block until
>+ *a child terminates or until output is ready.
>  *
>  *Job_InitCalled to intialize this module. in addition,
>  *any commands attached to the .BEGIN target
>@@ -107,6 +106,8 @@
> #include 
> #include 
> #include 
>+#include 
>+#include 
> #include 
> #include 
> #include 
>@@ -237,8 +238,7 @@
>* (2) a job can only be run locally, but
>* nLocal equals maxLocal */
> #ifndef RMT_WILL_WATCH
>-static fd_set outputs;/* Set of descriptors of pipes connected to
>-   * the output channels of children */
>+static intkqfd;   /* File descriptor obtained by kqueue() */
> #endif
> 
> STATIC GNode  *lastNode;  /* The node for which output was most recently
>@@ -692,8 +692,6 @@
> if (usePipes) {
> #ifdef RMT_WILL_WATCH
>   Rmt_Ignore(job->inPipe);
>-#else
>-  FD_CLR(job->inPipe, &outputs);
> #endif
>   if (job->outPipe != job->inPipe) {
>  (void) close(job->outPipe);
>@@ -1265,14 +1263,25 @@
>   /*
>* The first time a job is run for a node, we set the current
>* position in the buffer to the beginning and mark another
>-   * stream to watch in the outputs mask
>+   * stream to watch in the outputs mask.  The termination of this
>+   * job and the availability of new data in the pipe are
>+   * registered in the kqueue.
>*/
>+  struct kevent   kev[2];
>+
>   job->curPos = 0;
> 
> #ifdef RMT_WILL_WATCH
>   Rmt_Watch(job->inPipe, JobLocalInput, job);
> #else
>-  FD_SET(job->inPipe, &outputs);
>+  EV_SET(&kev[0], job->inPipe, EVFILT_READ, EV_ADD, 0, 0, job);
>+  EV_SET(&kev[1], job->pid, EVFILT_PROC, EV_ADD | EV_ONESHOT,
>+  NOTE_EXIT, 0, NULL);
>+  if (kevent(kqfd, kev, 2, NULL, 0, NULL) != 0) {
>+  /* kevent() will fail if the job is already finished */
>+  if (errno != EBADF && errno != ESRCH)
>+  Punt("kevent: %s", strerror(errno));
>+  }
> #endif /* RMT_WILL_WATCH */
>   }
> 
>@@ -2229,12 +2238,12 @@
> Job_CatchOutput()
> {
> int nfds;
>-struct timeva

Re: Junior Kernel Hacker page updated...

2002-10-03 Thread Peter S. Housel

On Wed, 2002-10-02 at 14:40, Stefan Farfeleder wrote:
> On Sat, Sep 14, 2002 at 12:17:53PM +0200, Poul-Henning Kamp wrote:
> >
> > This is just to note that I have updated the JKH page with a lot of
new
> > stuff, so if your coding-pencil itches:
> >
> > http://people.freebsd.org/~phk/TODO/
>
> |Make -j improvement
> |
> |make(1) with -j option uses a select loop to wait for events, and
every
> |100msec it drops out to look for processes exited etc.  A pure "make
> |buildworld" on a single-CPU machine is up to 25% faster that the best
> |"make -j N buildworld" time on the same hardware.  Changing to
timeout
> |to be 10msec improves things about 10%.
> |I think that make(1) should use kqueue(2) instead, since that would
> |eliminate the need for timeouts.
>
> Ok, here's what I came up with.  However, with the patch applied, each
> 'make buildworld' on a SMP machine throws tons of
>
> /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with
"filedesc
structure" locked from /freebsd/current/src/sys/kern/kern_event.c:959
>
> at me and freezes badly at some point (no breaking into ddb possible).
> This is totally repeatable.  Is anybody able to reproduce (and maybe
> fix) this?

This question is not relevant to the problem at hand, but... wouldn't it
be more portable to catch SIGCHLD (and continue using select(2)) than to
use kqueue(2)?

-Peter-

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-02 Thread Don Lewis

On  2 Oct, Don Lewis wrote:
> On  2 Oct, Stefan Farfeleder wrote:
> 
>> /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" 
>locked from /freebsd/current/src/sys/kern/kern_event.c:959
>> 
>> at me and freezes badly at some point (no breaking into ddb possible).
>> This is totally repeatable.  Is anybody able to reproduce (and maybe
>> fix) this?
> 
> It looks like the problem is that knote_attach() calls hashinit() while
> holding the lock, and that hashinit() calls MALLOC(..., M_WAITOK).
> 
> Try the following patch:
> 
[snip]

I went ahead and committed the patch.  It survived a
"make -j10 buildworld" test with your patched version of make.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-02 Thread Don Lewis

On  2 Oct, Stefan Farfeleder wrote:

> /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" 
>locked from /freebsd/current/src/sys/kern/kern_event.c:959
> 
> at me and freezes badly at some point (no breaking into ddb possible).
> This is totally repeatable.  Is anybody able to reproduce (and maybe
> fix) this?

It looks like the problem is that knote_attach() calls hashinit() while
holding the lock, and that hashinit() calls MALLOC(..., M_WAITOK).

Try the following patch:

Index: sys/kern/kern_event.c
===
RCS file: /home/ncvs/src/sys/kern/kern_event.c,v
retrieving revision 1.45
diff -u -r1.45 kern_event.c
--- sys/kern/kern_event.c   17 Aug 2002 02:36:16 -  1.45
+++ sys/kern/kern_event.c   3 Oct 2002 03:10:09 -
@@ -953,15 +953,27 @@
 static void
 knote_attach(struct knote *kn, struct filedesc *fdp)
 {
-   struct klist *list, *oldlist;
+   struct klist *list, *oldlist, *tmp_knhash;
+   u_long tmp_knhashmask;
int size, newsize;
 
FILEDESC_LOCK(fdp);
 
if (! kn->kn_fop->f_isfd) {
-   if (fdp->fd_knhashmask == 0)
-   fdp->fd_knhash = hashinit(KN_HASHSIZE, M_KQUEUE,
-   &fdp->fd_knhashmask);
+   if (fdp->fd_knhashmask == 0) {
+   FILEDESC_UNLOCK(fdp);
+   tmp_knhash = hashinit(KN_HASHSIZE, M_KQUEUE,
+   &tmp_knhashmask);
+   FILEDESC_LOCK(fdp);
+   if (fdp->fd_knhashmask == 0) {
+   fdp->fd_knhash = tmp_knhash;
+   fdp->fd_knhashmask = tmp_knhashmask;
+   } else {
+   FILEDESC_UNLOCK(fdp);
+   free(tmp_knhash, M_KQUEUE);
+   FILEDESC_LOCK(fdp);
+   }
+   }
list = &fdp->fd_knhash[KN_HASH(kn->kn_id, fdp->fd_knhashmask)];
goto done;
}



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Junior Kernel Hacker page updated...

2002-10-02 Thread Stefan Farfeleder

On Sat, Sep 14, 2002 at 12:17:53PM +0200, Poul-Henning Kamp wrote:
> 
> This is just to note that I have updated the JKH page with a lot of new
> stuff, so if your coding-pencil itches:
> 
>   http://people.freebsd.org/~phk/TODO/

|Make -j improvement
|
|make(1) with -j option uses a select loop to wait for events, and every
|100msec it drops out to look for processes exited etc.  A pure "make
|buildworld" on a single-CPU machine is up to 25% faster that the best
|"make -j N buildworld" time on the same hardware.  Changing to timeout
|to be 10msec improves things about 10%.
|I think that make(1) should use kqueue(2) instead, since that would
|eliminate the need for timeouts.

Ok, here's what I came up with.  However, with the patch applied, each
'make buildworld' on a SMP machine throws tons of

/freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" 
locked from /freebsd/current/src/sys/kern/kern_event.c:959

at me and freezes badly at some point (no breaking into ddb possible).
This is totally repeatable.  Is anybody able to reproduce (and maybe
fix) this?

Regards,
Stefan Farfeleder


Index: job.c
===
RCS file: /home/ncvs/src/usr.bin/make/job.c,v
retrieving revision 1.43
diff -u -r1.43 job.c
--- job.c   29 Sep 2002 00:20:28 -  1.43
+++ job.c   2 Oct 2002 21:34:51 -
@@ -64,9 +64,8 @@
  * Job_CatchOutput Print any output our children have produced.
  * Should also be called fairly frequently to
  * keep the user informed of what's going on.
- * If no output is waiting, it will block for
- * a time given by the SEL_* constants, below,
- * or until output is ready.
+ * If no output is waiting, it will block until
+ * a child terminates or until output is ready.
  *
  * Job_InitCalled to intialize this module. in addition,
  * any commands attached to the .BEGIN target
@@ -107,6 +106,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -237,8 +238,7 @@
 * (2) a job can only be run locally, but
 * nLocal equals maxLocal */
 #ifndef RMT_WILL_WATCH
-static fd_set  outputs;/* Set of descriptors of pipes connected to
-* the output channels of children */
+static int kqfd;   /* File descriptor obtained by kqueue() */
 #endif
 
 STATIC GNode   *lastNode;  /* The node for which output was most recently
@@ -692,8 +692,6 @@
 if (usePipes) {
 #ifdef RMT_WILL_WATCH
Rmt_Ignore(job->inPipe);
-#else
-   FD_CLR(job->inPipe, &outputs);
 #endif
if (job->outPipe != job->inPipe) {
   (void) close(job->outPipe);
@@ -1265,14 +1263,25 @@
/*
 * The first time a job is run for a node, we set the current
 * position in the buffer to the beginning and mark another
-* stream to watch in the outputs mask
+* stream to watch in the outputs mask.  The termination of this
+* job and the availability of new data in the pipe are
+* registered in the kqueue.
 */
+   struct kevent   kev[2];
+
job->curPos = 0;
 
 #ifdef RMT_WILL_WATCH
Rmt_Watch(job->inPipe, JobLocalInput, job);
 #else
-   FD_SET(job->inPipe, &outputs);
+   EV_SET(&kev[0], job->inPipe, EVFILT_READ, EV_ADD, 0, 0, job);
+   EV_SET(&kev[1], job->pid, EVFILT_PROC, EV_ADD | EV_ONESHOT,
+   NOTE_EXIT, 0, NULL);
+   if (kevent(kqfd, kev, 2, NULL, 0, NULL) != 0) {
+   /* kevent() will fail if the job is already finished */
+   if (errno != EBADF && errno != ESRCH)
+   Punt("kevent: %s", strerror(errno));
+   }
 #endif /* RMT_WILL_WATCH */
}
 
@@ -2229,12 +2238,12 @@
 Job_CatchOutput()
 {
 int  nfds;
-struct timeval   timeout;
-fd_set   readfds;
-LstNode  ln;
-Job  *job;
 #ifdef RMT_WILL_WATCH
 int  pnJobs;   /* Previous nJobs */
+#else
+struct keventkev[4];   /* 4 is somewhat arbitrary */
+size_t   kevsize = sizeof(kev) / sizeof(kev[0]);
+int  i;
 #endif
 
 (void) fflush(stdout);
@@ -2262,25 +2271,20 @@
 }
 #else
 if (usePipes) {
-   readfds = outputs;
-   timeout.tv_sec = SEL_SEC;
-   timeout.tv_usec = SEL_USEC;
-
-   if ((nfds = select(FD_SETSIZE, &readfds, (fd_set *) 0,
-  (fd_set *) 0, &timeout)) <= 0)
-   return;
-   else {
-   if (Lst_