Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
Byron Stanoszek <[EMAIL PROTECTED]> writes: > I narrowed the problem down to a subset of patches from the MM set in > test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for > i386), but I'm not yet sure why. test13-pre2 and up work without any problems > on an Intel cpu (Pentium 180 & P3 800 tested). [Snip] > root 351 0.0 1.2 9292 1576 ?S21:42 0:00 named > root 361 0.0 0.0 00 ?Z21:42 0:00 [named ] > root 363 0.0 1.2 9292 1576 ?S21:42 0:00 named > root 364 0.0 1.2 9292 1576 ?S21:42 0:00 named > root 365 0.0 0.7 2064 936 ?S21:42 0:00 /usr/sbin/sshd > ..etc > (Note PID 361) I am seeing the same thing with the [named ] on a PIII 600, so it is not Athlon specific. I haven't yet tried test13-pre6 but it happens with pre3,4,5. So I am still running on test12. I will try running pre6 then, if it still fails will try with your context.patch and see if that fixes it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
Byron Stanoszek [EMAIL PROTECTED] writes: I narrowed the problem down to a subset of patches from the MM set in test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for i386), but I'm not yet sure why. test13-pre2 and up work without any problems on an Intel cpu (Pentium 180 P3 800 tested). [Snip] root 351 0.0 1.2 9292 1576 ?S21:42 0:00 named root 361 0.0 0.0 00 ?Z21:42 0:00 [named defunct] root 363 0.0 1.2 9292 1576 ?S21:42 0:00 named root 364 0.0 1.2 9292 1576 ?S21:42 0:00 named root 365 0.0 0.7 2064 936 ?S21:42 0:00 /usr/sbin/sshd ..etc (Note PID 361) I am seeing the same thing with the [named defunct] on a PIII 600, so it is not Athlon specific. I haven't yet tried test13-pre6 but it happens with pre3,4,5. So I am still running on test12. I will try running pre6 then, if it still fails will try with your context.patch and see if that fixes it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
On Fri, Dec 29, 2000 at 07:36:21PM -0800, Linus Torvalds wrote: > Maybe your libc is different on the different machines? Normal programs > shouldn't use segments at all, so I really do not see how this patch could > matter in the least, even if it was completely and utterly buggy (which is > not obvious at first glance). > > I wonder why you seem to have an LDT at all.. glibc 2.2 linuxthreads sets up an LDT to use %gs as thread local data base pointer. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
Ok, I don't think this is an athlon bug, and I think I've figured out what the problem is. For now, you rtemporary fix is probably fine, I'll clean stuff up a bit and make a nicer patch available tomorrow. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
On Fri, 29 Dec 2000, Byron Stanoszek wrote: > > I narrowed the problem down to a subset of patches from the MM set in > test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for > i386), but I'm not yet sure why. test13-pre2 and up work without any problems > on an Intel cpu (Pentium 180 & P3 800 tested). Cool. Maybe your libc is different on the different machines? Normal programs shouldn't use segments at all, so I really do not see how this patch could matter in the least, even if it was completely and utterly buggy (which is not obvious at first glance). I wonder why you seem to have an LDT at all.. > Anyways, I can't seem to find out what really changes with the patch except for > the obvious 'void *segment' changing into a typedef-struct. Would you mind trying to hunt this down a bit more? In particular, it would be good to see if the behaviour is the same if you do the typedef change but leave the other logic alone. That would also cut down on the purely syntactic changes of the patch. I'll take a look at the code here. Thanks, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
On Fri, 29 Dec 2000, Linus Torvalds wrote: > > Ok, there's a test13-pre6 out there now, which does a partial sync with > Alan, in addition to hopefully fixing the innd shared mapping writeback > problem for good. Thanks to Marcelo Tosatti and others.. I've been noticing a problem with the memory context switching conflicting with fork() on my Athlon. The problem began in the test13-pre2 patch, and because nobody else has seen this problem (or otherwise reported it) since then, I felt I should look into it a little further. I narrowed the problem down to a subset of patches from the MM set in test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for i386), but I'm not yet sure why. test13-pre2 and up work without any problems on an Intel cpu (Pentium 180 & P3 800 tested). Anyways, I can't seem to find out what really changes with the patch except for the obvious 'void *segment' changing into a typedef-struct. The only thing I can think of is that the compiler decodes it differently, but I think I can safely rule that out. I tried both 2.91.66 and 2.95.2, using both different types of parameters for P5 & K7 (-march=i586 & -march=i686 -malign-functions=4) and it still gives the problem on the Athlon. Maybe there's something I've overlooked in that attached patch. Request for an extra pair of eyes please. :) Here are the casual symptoms. The parent seems to die as soon as a forked child exits, which seems to me that a new LDT isn't being initialized correctly: root:~> ps -aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 1.1 0.4 1228 532 ?S21:42 0:05 init [3] root 2 0.0 0.0 00 ?SW 21:42 0:00 [keventd] root 3 0.0 0.0 00 ?SW 21:42 0:00 [kswapd] root 4 0.0 0.0 00 ?SW 21:42 0:00 [kreclaimd] root 5 0.0 0.0 00 ?SW 21:42 0:00 [bdflush] root 6 0.0 0.0 00 ?SW 21:42 0:00 [kupdate] root 289 0.0 0.4 1284 604 ?S21:42 0:00 syslogd -m 0 root 299 0.0 0.8 1912 1104 ?S21:42 0:00 klogd root 351 0.0 1.2 9292 1576 ?S21:42 0:00 named root 361 0.0 0.0 00 ?Z21:42 0:00 [named ] root 363 0.0 1.2 9292 1576 ?S21:42 0:00 named root 364 0.0 1.2 9292 1576 ?S21:42 0:00 named root 365 0.0 0.7 2064 936 ?S21:42 0:00 /usr/sbin/sshd ..etc (Note PID 361) root:~> strace nslookup sunsite.unc.edu : : rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0 rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0 rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0 rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0 getpid()= 2615 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 close(3)= 0 socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented) socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented) socket(PF_INET6, SOCK_STREAM, 0)= -1 EAFNOSUPPORT (Address family not supported by protocol)--- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++ ---Example parent/child process: root:~> tar -xzvvf ../pkgs/zgv-5.2.tar.gz : : -rw--- rus/users 1356 2000-06-01 11:46:57 zgv-5.2/INSTALL -rw--- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING -rw--- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts -rw--- rus/users 120 2000-04-22 22:46:49 zgv-5.2/AUTHORS -rw--- rus/users 3714 2000-01-23 16:29:40 zgv-5.2/SECURITY Segmentation fault (core dumped) root:~> strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz : : open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4 write(4, "\t\tGNU GENERAL PUBLIC LICENSE"..., 9728) = 9728 read(3, "ccept this License. Therefore, "..., 10240) = 10240 write(4, "ccept this License. Therefore, "..., 8248) = 8248 close(4)= 0 utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0 chown32("zgv-5.2/COPYING", 500, 100)= 0 write(1, "-rw--- rus/users 1077 1"..., 72-rw--- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts ) = 72 open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4 write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024 read(3, "\"as\nis\" without express or impli"..., 10240) = 8192 --- SIGCHLD (Child exited) --- --- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++ Ideas, anyone? -Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] diff -u --recursive --new-file v2.4.0-test12/linux/arch/i386/kernel/ldt.c
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
On Fri, 29 Dec 2000, Linus Torvalds wrote: Ok, there's a test13-pre6 out there now, which does a partial sync with Alan, in addition to hopefully fixing the innd shared mapping writeback problem for good. Thanks to Marcelo Tosatti and others.. I've been noticing a problem with the memory context switching conflicting with fork() on my Athlon. The problem began in the test13-pre2 patch, and because nobody else has seen this problem (or otherwise reported it) since then, I felt I should look into it a little further. I narrowed the problem down to a subset of patches from the MM set in test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for i386), but I'm not yet sure why. test13-pre2 and up work without any problems on an Intel cpu (Pentium 180 P3 800 tested). Anyways, I can't seem to find out what really changes with the patch except for the obvious 'void *segment' changing into a typedef-struct. The only thing I can think of is that the compiler decodes it differently, but I think I can safely rule that out. I tried both 2.91.66 and 2.95.2, using both different types of parameters for P5 K7 (-march=i586 -march=i686 -malign-functions=4) and it still gives the problem on the Athlon. Maybe there's something I've overlooked in that attached patch. Request for an extra pair of eyes please. :) Here are the casual symptoms. The parent seems to die as soon as a forked child exits, which seems to me that a new LDT isn't being initialized correctly: root:~ ps -aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 1.1 0.4 1228 532 ?S21:42 0:05 init [3] root 2 0.0 0.0 00 ?SW 21:42 0:00 [keventd] root 3 0.0 0.0 00 ?SW 21:42 0:00 [kswapd] root 4 0.0 0.0 00 ?SW 21:42 0:00 [kreclaimd] root 5 0.0 0.0 00 ?SW 21:42 0:00 [bdflush] root 6 0.0 0.0 00 ?SW 21:42 0:00 [kupdate] root 289 0.0 0.4 1284 604 ?S21:42 0:00 syslogd -m 0 root 299 0.0 0.8 1912 1104 ?S21:42 0:00 klogd root 351 0.0 1.2 9292 1576 ?S21:42 0:00 named root 361 0.0 0.0 00 ?Z21:42 0:00 [named defunct] root 363 0.0 1.2 9292 1576 ?S21:42 0:00 named root 364 0.0 1.2 9292 1576 ?S21:42 0:00 named root 365 0.0 0.7 2064 936 ?S21:42 0:00 /usr/sbin/sshd ..etc (Note PID 361) root:~ strace nslookup sunsite.unc.edu : : rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0 rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0 rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0 rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0 getpid()= 2615 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 close(3)= 0 socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented) socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented) socket(PF_INET6, SOCK_STREAM, 0)= -1 EAFNOSUPPORT (Address family not supported by protocol)--- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++ ---Example parent/child process: root:~ tar -xzvvf ../pkgs/zgv-5.2.tar.gz : : -rw--- rus/users 1356 2000-06-01 11:46:57 zgv-5.2/INSTALL -rw--- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING -rw--- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts -rw--- rus/users 120 2000-04-22 22:46:49 zgv-5.2/AUTHORS -rw--- rus/users 3714 2000-01-23 16:29:40 zgv-5.2/SECURITY Segmentation fault (core dumped) root:~ strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz : : open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4 write(4, "\t\tGNU GENERAL PUBLIC LICENSE"..., 9728) = 9728 read(3, "ccept this License. Therefore, "..., 10240) = 10240 write(4, "ccept this License. Therefore, "..., 8248) = 8248 close(4)= 0 utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0 chown32("zgv-5.2/COPYING", 500, 100)= 0 write(1, "-rw--- rus/users 1077 1"..., 72-rw--- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts ) = 72 open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4 write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024 read(3, "\"as\nis\" without express or impli"..., 10240) = 8192 --- SIGCHLD (Child exited) --- --- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++ Ideas, anyone? -Byron -- Byron Stanoszek Ph: (330) 644-3059 Systems Programmer Fax: (330) 644-8110 Commercial Timesharing Inc. Email: [EMAIL PROTECTED] diff -u --recursive --new-file v2.4.0-test12/linux/arch/i386/kernel/ldt.c
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
On Fri, 29 Dec 2000, Byron Stanoszek wrote: I narrowed the problem down to a subset of patches from the MM set in test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for i386), but I'm not yet sure why. test13-pre2 and up work without any problems on an Intel cpu (Pentium 180 P3 800 tested). Cool. Maybe your libc is different on the different machines? Normal programs shouldn't use segments at all, so I really do not see how this patch could matter in the least, even if it was completely and utterly buggy (which is not obvious at first glance). I wonder why you seem to have an LDT at all.. Anyways, I can't seem to find out what really changes with the patch except for the obvious 'void *segment' changing into a typedef-struct. Would you mind trying to hunt this down a bit more? In particular, it would be good to see if the behaviour is the same if you do the typedef change but leave the other logic alone. That would also cut down on the purely syntactic changes of the patch. I'll take a look at the code here. Thanks, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
Ok, I don't think this is an athlon bug, and I think I've figured out what the problem is. For now, you rtemporary fix is probably fine, I'll clean stuff up a bit and make a nicer patch available tomorrow. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)
On Fri, Dec 29, 2000 at 07:36:21PM -0800, Linus Torvalds wrote: Maybe your libc is different on the different machines? Normal programs shouldn't use segments at all, so I really do not see how this patch could matter in the least, even if it was completely and utterly buggy (which is not obvious at first glance). I wonder why you seem to have an LDT at all.. glibc 2.2 linuxthreads sets up an LDT to use %gs as thread local data base pointer. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/