Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-30 Thread Graham Murray

Byron Stanoszek <[EMAIL PROTECTED]> writes:

> I narrowed the problem down to a subset of patches from the MM set in
> test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
> i386), but I'm not yet sure why. test13-pre2 and up work without any problems
> on an Intel cpu (Pentium 180 & P3 800 tested).
[Snip] 
> root   351  0.0  1.2  9292 1576 ?S21:42   0:00 named
> root   361  0.0  0.0 00 ?Z21:42   0:00 [named ]
> root   363  0.0  1.2  9292 1576 ?S21:42   0:00 named
> root   364  0.0  1.2  9292 1576 ?S21:42   0:00 named
> root   365  0.0  0.7  2064  936 ?S21:42   0:00 /usr/sbin/sshd
> ..etc
> (Note PID 361)

I am seeing the same thing with the [named ] on a PIII 600,
so it is not Athlon specific. I haven't yet tried test13-pre6 but it
happens with pre3,4,5. So I am still running on test12.

I will try running pre6 then, if it still fails will try with your
context.patch and see if that fixes it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-30 Thread Graham Murray

Byron Stanoszek [EMAIL PROTECTED] writes:

 I narrowed the problem down to a subset of patches from the MM set in
 test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
 i386), but I'm not yet sure why. test13-pre2 and up work without any problems
 on an Intel cpu (Pentium 180  P3 800 tested).
[Snip] 
 root   351  0.0  1.2  9292 1576 ?S21:42   0:00 named
 root   361  0.0  0.0 00 ?Z21:42   0:00 [named defunct]
 root   363  0.0  1.2  9292 1576 ?S21:42   0:00 named
 root   364  0.0  1.2  9292 1576 ?S21:42   0:00 named
 root   365  0.0  0.7  2064  936 ?S21:42   0:00 /usr/sbin/sshd
 ..etc
 (Note PID 361)

I am seeing the same thing with the [named defunct] on a PIII 600,
so it is not Athlon specific. I haven't yet tried test13-pre6 but it
happens with pre3,4,5. So I am still running on test12.

I will try running pre6 then, if it still fails will try with your
context.patch and see if that fixes it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Andi Kleen

On Fri, Dec 29, 2000 at 07:36:21PM -0800, Linus Torvalds wrote:
> Maybe your libc is different on the different machines? Normal programs
> shouldn't use segments at all, so I really do not see how this patch could
> matter in the least, even if it was completely and utterly buggy (which is
> not obvious at first glance).
> 
> I wonder why you seem to have an LDT at all..

glibc 2.2 linuxthreads sets up an LDT to use %gs as thread local data base 
pointer.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Linus Torvalds



Ok, I don't think this is an athlon bug, and I think I've figured out what
the problem is.  For now, you rtemporary fix is probably fine, I'll clean
stuff up a bit and make a nicer patch available tomorrow.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Linus Torvalds



On Fri, 29 Dec 2000, Byron Stanoszek wrote:
> 
> I narrowed the problem down to a subset of patches from the MM set in
> test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
> i386), but I'm not yet sure why. test13-pre2 and up work without any problems
> on an Intel cpu (Pentium 180 & P3 800 tested).

Cool.

Maybe your libc is different on the different machines? Normal programs
shouldn't use segments at all, so I really do not see how this patch could
matter in the least, even if it was completely and utterly buggy (which is
not obvious at first glance).

I wonder why you seem to have an LDT at all..

> Anyways, I can't seem to find out what really changes with the patch except for
> the obvious 'void *segment' changing into a typedef-struct.

Would you mind trying to hunt this down a bit more? In particular, it
would be good to see if the behaviour is the same if you do the typedef
change but leave the other logic alone. That would also cut down on the
purely syntactic changes of the patch.

I'll take a look at the code here.

Thanks,

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Byron Stanoszek

On Fri, 29 Dec 2000, Linus Torvalds wrote:

> 
> Ok, there's a test13-pre6 out there now, which does a partial sync with
> Alan, in addition to hopefully fixing the innd shared mapping writeback
> problem for good.  Thanks to Marcelo Tosatti and others..

I've been noticing a problem with the memory context switching conflicting with
fork() on my Athlon. The problem began in the test13-pre2 patch, and because
nobody else has seen this problem (or otherwise reported it) since then, I
felt I should look into it a little further.

I narrowed the problem down to a subset of patches from the MM set in
test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
i386), but I'm not yet sure why. test13-pre2 and up work without any problems
on an Intel cpu (Pentium 180 & P3 800 tested).

Anyways, I can't seem to find out what really changes with the patch except for
the obvious 'void *segment' changing into a typedef-struct. The only thing I
can think of is that the compiler decodes it differently, but I think I can
safely rule that out. I tried both 2.91.66 and 2.95.2, using both different
types of parameters for P5 & K7 (-march=i586 & -march=i686 -malign-functions=4)
and it still gives the problem on the Athlon. Maybe there's something I've
overlooked in that attached patch. Request for an extra pair of eyes please. :)


Here are the casual symptoms. The parent seems to die as soon as a forked child
exits, which seems to me that a new LDT isn't being initialized correctly:

root:~> ps -aux
USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 1  1.1  0.4  1228  532 ?S21:42   0:05 init [3]
root 2  0.0  0.0 00 ?SW   21:42   0:00 [keventd]
root 3  0.0  0.0 00 ?SW   21:42   0:00 [kswapd]
root 4  0.0  0.0 00 ?SW   21:42   0:00 [kreclaimd]
root 5  0.0  0.0 00 ?SW   21:42   0:00 [bdflush]
root 6  0.0  0.0 00 ?SW   21:42   0:00 [kupdate]
root   289  0.0  0.4  1284  604 ?S21:42   0:00 syslogd -m 0
root   299  0.0  0.8  1912 1104 ?S21:42   0:00 klogd
root   351  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   361  0.0  0.0 00 ?Z21:42   0:00 [named ]
root   363  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   364  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   365  0.0  0.7  2064  936 ?S21:42   0:00 /usr/sbin/sshd
..etc
(Note PID 361)

root:~> strace nslookup sunsite.unc.edu
 :
 :
rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
getpid()= 2615
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
close(3)= 0
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 EAFNOSUPPORT (Address family not 
supported by protocol)--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++


---Example parent/child process:

root:~> tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
-rw--- rus/users  1356 2000-06-01 11:46:57 zgv-5.2/INSTALL
-rw--- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING
-rw--- rus/users  1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
-rw--- rus/users   120 2000-04-22 22:46:49 zgv-5.2/AUTHORS
-rw--- rus/users  3714 2000-01-23 16:29:40 zgv-5.2/SECURITY
Segmentation fault (core dumped)

root:~> strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "\t\tGNU GENERAL PUBLIC LICENSE"..., 9728) = 9728
read(3, "ccept this License.  Therefore, "..., 10240) = 10240
write(4, "ccept this License.  Therefore, "..., 8248) = 8248
close(4)= 0
utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0
chown32("zgv-5.2/COPYING", 500, 100)= 0
write(1, "-rw--- rus/users  1077 1"..., 72-rw--- rus/users  1077 
1998-08-26 09:24:31 zgv-5.2/README.fonts
) = 72
open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024
read(3, "\"as\nis\" without express or impli"..., 10240) = 8192
--- SIGCHLD (Child exited) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

Ideas, anyone?

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]


diff -u --recursive --new-file v2.4.0-test12/linux/arch/i386/kernel/ldt.c 

Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Byron Stanoszek

On Fri, 29 Dec 2000, Linus Torvalds wrote:

 
 Ok, there's a test13-pre6 out there now, which does a partial sync with
 Alan, in addition to hopefully fixing the innd shared mapping writeback
 problem for good.  Thanks to Marcelo Tosatti and others..

I've been noticing a problem with the memory context switching conflicting with
fork() on my Athlon. The problem began in the test13-pre2 patch, and because
nobody else has seen this problem (or otherwise reported it) since then, I
felt I should look into it a little further.

I narrowed the problem down to a subset of patches from the MM set in
test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
i386), but I'm not yet sure why. test13-pre2 and up work without any problems
on an Intel cpu (Pentium 180  P3 800 tested).

Anyways, I can't seem to find out what really changes with the patch except for
the obvious 'void *segment' changing into a typedef-struct. The only thing I
can think of is that the compiler decodes it differently, but I think I can
safely rule that out. I tried both 2.91.66 and 2.95.2, using both different
types of parameters for P5  K7 (-march=i586  -march=i686 -malign-functions=4)
and it still gives the problem on the Athlon. Maybe there's something I've
overlooked in that attached patch. Request for an extra pair of eyes please. :)


Here are the casual symptoms. The parent seems to die as soon as a forked child
exits, which seems to me that a new LDT isn't being initialized correctly:

root:~ ps -aux
USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 1  1.1  0.4  1228  532 ?S21:42   0:05 init [3]
root 2  0.0  0.0 00 ?SW   21:42   0:00 [keventd]
root 3  0.0  0.0 00 ?SW   21:42   0:00 [kswapd]
root 4  0.0  0.0 00 ?SW   21:42   0:00 [kreclaimd]
root 5  0.0  0.0 00 ?SW   21:42   0:00 [bdflush]
root 6  0.0  0.0 00 ?SW   21:42   0:00 [kupdate]
root   289  0.0  0.4  1284  604 ?S21:42   0:00 syslogd -m 0
root   299  0.0  0.8  1912 1104 ?S21:42   0:00 klogd
root   351  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   361  0.0  0.0 00 ?Z21:42   0:00 [named defunct]
root   363  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   364  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   365  0.0  0.7  2064  936 ?S21:42   0:00 /usr/sbin/sshd
..etc
(Note PID 361)

root:~ strace nslookup sunsite.unc.edu
 :
 :
rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
getpid()= 2615
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
close(3)= 0
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 EAFNOSUPPORT (Address family not 
supported by protocol)--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++


---Example parent/child process:

root:~ tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
-rw--- rus/users  1356 2000-06-01 11:46:57 zgv-5.2/INSTALL
-rw--- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING
-rw--- rus/users  1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
-rw--- rus/users   120 2000-04-22 22:46:49 zgv-5.2/AUTHORS
-rw--- rus/users  3714 2000-01-23 16:29:40 zgv-5.2/SECURITY
Segmentation fault (core dumped)

root:~ strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "\t\tGNU GENERAL PUBLIC LICENSE"..., 9728) = 9728
read(3, "ccept this License.  Therefore, "..., 10240) = 10240
write(4, "ccept this License.  Therefore, "..., 8248) = 8248
close(4)= 0
utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0
chown32("zgv-5.2/COPYING", 500, 100)= 0
write(1, "-rw--- rus/users  1077 1"..., 72-rw--- rus/users  1077 
1998-08-26 09:24:31 zgv-5.2/README.fonts
) = 72
open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024
read(3, "\"as\nis\" without express or impli"..., 10240) = 8192
--- SIGCHLD (Child exited) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

Ideas, anyone?

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]


diff -u --recursive --new-file v2.4.0-test12/linux/arch/i386/kernel/ldt.c 

Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Linus Torvalds



On Fri, 29 Dec 2000, Byron Stanoszek wrote:
 
 I narrowed the problem down to a subset of patches from the MM set in
 test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
 i386), but I'm not yet sure why. test13-pre2 and up work without any problems
 on an Intel cpu (Pentium 180  P3 800 tested).

Cool.

Maybe your libc is different on the different machines? Normal programs
shouldn't use segments at all, so I really do not see how this patch could
matter in the least, even if it was completely and utterly buggy (which is
not obvious at first glance).

I wonder why you seem to have an LDT at all..

 Anyways, I can't seem to find out what really changes with the patch except for
 the obvious 'void *segment' changing into a typedef-struct.

Would you mind trying to hunt this down a bit more? In particular, it
would be good to see if the behaviour is the same if you do the typedef
change but leave the other logic alone. That would also cut down on the
purely syntactic changes of the patch.

I'll take a look at the code here.

Thanks,

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Linus Torvalds



Ok, I don't think this is an athlon bug, and I think I've figured out what
the problem is.  For now, you rtemporary fix is probably fine, I'll clean
stuff up a bit and make a nicer patch available tomorrow.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Andi Kleen

On Fri, Dec 29, 2000 at 07:36:21PM -0800, Linus Torvalds wrote:
 Maybe your libc is different on the different machines? Normal programs
 shouldn't use segments at all, so I really do not see how this patch could
 matter in the least, even if it was completely and utterly buggy (which is
 not obvious at first glance).
 
 I wonder why you seem to have an LDT at all..

glibc 2.2 linuxthreads sets up an LDT to use %gs as thread local data base 
pointer.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/