RE: obsolete code must die
I agree that removing support for any hardware is a bad idea but I question the idea of putting it all in one monolithic download (tar file). If we're considering the concern for less developed nations with older hardware, imagine how you would like to download the whole kernel with an old 2400 bps modem. Not a fun thought. Would it make sense to create some sort of 'make config' script that determines what you want in your kernel and then downloads only those components? After all, with the constant release of new hardware, isn't a 50MB kernel release not too far away? 100MB? --Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Colonel > Sent: Thursday, June 14, 2001 10:32 AM > To: [EMAIL PROTECTED] > Subject: Re: obsolete code must die > > > In list.kernel, you wrote: > > >i think we are all missing the ball here: i am happy when i see driver > >support for a piece of hardware that i have _NEVER_ heard of and at most > >_ONE_ person uses it. why? it means more stuff works in linux. we > >dont need to defend how many people use hardware X. if you have X, good > >for you. if not, you dont care, but at least good for linux as a whole. > > Good Point! > > >lets stop fanning the flames and let this (Microsoft-using, as Rik > >pointed out) troll die off. > > Agreed, he made the filter already. But it was good for some laughs, > checking a few cobwebs and I really didn't see flames. Plus I got to > test my new mailing list archive. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: obsolete code must die
I agree that removing support for any hardware is a bad idea but I question the idea of putting it all in one monolithic download (tar file). If we're considering the concern for less developed nations with older hardware, imagine how you would like to download the whole kernel with an old 2400 bps modem. Not a fun thought. Would it make sense to create some sort of 'make config' script that determines what you want in your kernel and then downloads only those components? After all, with the constant release of new hardware, isn't a 50MB kernel release not too far away? 100MB? --Rainer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Colonel Sent: Thursday, June 14, 2001 10:32 AM To: [EMAIL PROTECTED] Subject: Re: obsolete code must die In list.kernel, you wrote: i think we are all missing the ball here: i am happy when i see driver support for a piece of hardware that i have _NEVER_ heard of and at most _ONE_ person uses it. why? it means more stuff works in linux. we dont need to defend how many people use hardware X. if you have X, good for you. if not, you dont care, but at least good for linux as a whole. Good Point! lets stop fanning the flames and let this (Microsoft-using, as Rik pointed out) troll die off. Agreed, he made the filter already. But it was good for some laughs, checking a few cobwebs and I really didn't see flames. Plus I got to test my new mailing list archive. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Building autofs
Hi all, I'm trying to use autofs for the first time and am running into some problems. First, the documentation seems quite weak, that is, I'm not sure if what I have is what I should have. I managed to find an autofs version 4 pre 9 tarball on the kernel mirrors. This seem the latest but is still a bit old and the referenced home page doesn't seem any newer. My real problem, however, is that when I try to build it I get this error: lookup_program.c:147: `OPEN_MAX' undeclared (first use in this function) My understanding is that OPEN_MAX is defined in linux/limits.h but I hesitate to change the code since I would expect this to build out of the box. Cas someone who is using autofs give me some pointers? Am I on the right track? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Building autofs
Hi all, I'm trying to use autofs for the first time and am running into some problems. First, the documentation seems quite weak, that is, I'm not sure if what I have is what I should have. I managed to find an autofs version 4 pre 9 tarball on the kernel mirrors. This seem the latest but is still a bit old and the referenced home page doesn't seem any newer. My real problem, however, is that when I try to build it I get this error: lookup_program.c:147: `OPEN_MAX' undeclared (first use in this function) My understanding is that OPEN_MAX is defined in linux/limits.h but I hesitate to change the code since I would expect this to build out of the box. Cas someone who is using autofs give me some pointers? Am I on the right track? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: /proc/stat missing disk_io info
Not to be pushy or anything but since I received zero responses to this I was wondering what else I can do. I'd be happy to patch the problem myself but I have no idea what the correct value for DK_MAX_MAJOR should be. Anywho, if anyone has any thoughts I'd appreciate them. --Rainer > -Original Message- > I was wondering why some of my disks don't show up in > /proc/stat's disk_io > line. Specifically, my line says: > > disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0) > > This equates to my floppy and first cdrom. I also have a second cdrom (RW) > and 2 hard disks. Looking at the code (kstat_read_proc in > fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which > is defined > as 16 in kernel_stat.h. The problem is that my 2 HDs have a major > number of > 22. > > I don't know enough to produce a patch, that is, what should > DK_MAX_MAJOR be > set to, but I believe the above is the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: /proc/stat missing disk_io info
Not to be pushy or anything but since I received zero responses to this I was wondering what else I can do. I'd be happy to patch the problem myself but I have no idea what the correct value for DK_MAX_MAJOR should be. Anywho, if anyone has any thoughts I'd appreciate them. --Rainer -Original Message- I was wondering why some of my disks don't show up in /proc/stat's disk_io line. Specifically, my line says: disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0) This equates to my floppy and first cdrom. I also have a second cdrom (RW) and 2 hard disks. Looking at the code (kstat_read_proc in fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which is defined as 16 in kernel_stat.h. The problem is that my 2 HDs have a major number of 22. I don't know enough to produce a patch, that is, what should DK_MAX_MAJOR be set to, but I believe the above is the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
/proc/stat missing disk_io info
Hi all, I was wondering why some of my disks don't show up in /proc/stat's disk_io line. Specifically, my line says: disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0) This equates to my floppy and first cdrom. I also have a second cdrom (RW) and 2 hard disks. Looking at the code (kstat_read_proc in fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which is defined as 16 in kernel_stat.h. The problem is that my 2 HDs have a major number of 22. I don't know enough to produce a patch, that is, what should DK_MAX_MAJOR be set to, but I believe the above is the problem. Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
/proc/stat missing disk_io info
Hi all, I was wondering why some of my disks don't show up in /proc/stat's disk_io line. Specifically, my line says: disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0) This equates to my floppy and first cdrom. I also have a second cdrom (RW) and 2 hard disks. Looking at the code (kstat_read_proc in fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which is defined as 16 in kernel_stat.h. The problem is that my 2 HDs have a major number of 22. I don't know enough to produce a patch, that is, what should DK_MAX_MAJOR be set to, but I believe the above is the problem. Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] smbfs cache rewrite - 2nd try
This is working great for me so far. I've now got my full 1G RAM and samba seems to be working fine. Woohoo! One more oops dead. Put it in the official kernel. Put it in the official kernel. Put it in the err, excuse me. I was chanting again. --Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Urban Widmark > Sent: Monday, January 29, 2001 1:23 AM > To: [EMAIL PROTECTED] > Cc: Rainer Mager; Scott A. Sibert > Subject: [patch] smbfs cache rewrite - 2nd try > > > > Smbfs testers wanted, with or without highmem boxes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] smbfs cache rewrite - 2nd try
This is working great for me so far. I've now got my full 1G RAM and samba seems to be working fine. Woohoo! One more oops dead. Put it in the official kernel. Put it in the official kernel. Put it in the err, excuse me. I was chanting again. --Rainer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Urban Widmark Sent: Monday, January 29, 2001 1:23 AM To: [EMAIL PROTECTED] Cc: Rainer Mager; Scott A. Sibert Subject: [patch] smbfs cache rewrite - 2nd try Smbfs testers wanted, with or without highmem boxes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Hi all, Well, I upgraded my system to glibc 2.2.1 with few problems. Unfortunately, there are no improvements in my stability problems. X still dies. So, I ask again, how can I debug this? How can I determine if this is a kernel problem or not? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Hi all, Well, I upgraded my system to glibc 2.2.1 with few problems. Unfortunately, there are no improvements in my stability problems. X still dies. So, I ask again, how can I debug this? How can I determine if this is a kernel problem or not? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
As per Russell King's suggestion, I ran memtest86 on my system for about 12 hours last night. I found no memory errors. Note that the tests did not complete because I had to stop them this morning. I'll contiue them tonight. They got through test 9 of 11. As per David Ford's suggestion, I am looking into upgrading to glibc 2.2.1. Can someone please give hints on doing this. I tried to upgrade to 2.2 a few weeks ago and after the 'make install' and then reboot my system was very broken and I had to reinstall the RedHat glibc RPM from CD to recover. I found a howto but it seems pretty old. How do other people do this? I've also done a strace on X. Now what do I do with this 4 MB log file? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Thanks for all the info, comments below: First, I ran X in gdb and got the following via 'bt' after X died. This is my first experience with gdb so if I should do anything in particular, please tell me. #0 0x401addeb in __sigsuspend (set=0xb930) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 #1 0x80495a4 in startServer () #2 0x804922c in main () #3 0x401a79cb in __libc_start_main (main=0x8048ee0 , argc=5, argv=0xbacc, init=0x8048a64 <_init>, fini=0x8049a44 <_fini>, rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbac4) at ../sysdeps/generic/libc-start.c:92 > David Ford: > > Upgrade -past- 2.2, get 2.2.1. 2.2 causes numerous segfaults, > notably sendmail > and apache stop working. I'm willing. Are there any good how-tos on doing this without killing your system? The last time I manually upgraded libc was about 5 years ago. > Russell King: > > > In answer to the original posters question, the first step would be > to grab a copy of memtest86 (iirc its a program that is run from floppy > disk) and run that on your system. That /should/ (and I stress should > there) detect any RAM problems you have. I'll try this. > Barry K. Nathan: > > > Does it always happen when you are moving the mouse over a button or > windowbar or some other on-screen object like that? Nope. If anything I'd say it happens during blitting (scrolling, screen refreshing, etc). Also, I'm not overclocking anything. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Thanks for all the info, comments below: First, I ran X in gdb and got the following via 'bt' after X died. This is my first experience with gdb so if I should do anything in particular, please tell me. #0 0x401addeb in __sigsuspend (set=0xb930) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 #1 0x80495a4 in startServer () #2 0x804922c in main () #3 0x401a79cb in __libc_start_main (main=0x8048ee0 main, argc=5, argv=0xbacc, init=0x8048a64 _init, fini=0x8049a44 _fini, rtld_fini=0x4000ae60 _dl_fini, stack_end=0xbac4) at ../sysdeps/generic/libc-start.c:92 David Ford: Upgrade -past- 2.2, get 2.2.1. 2.2 causes numerous segfaults, notably sendmail and apache stop working. I'm willing. Are there any good how-tos on doing this without killing your system? The last time I manually upgraded libc was about 5 years ago. Russell King: In answer to the original posters question, the first step would be to grab a copy of memtest86 (iirc its a program that is run from floppy disk) and run that on your system. That /should/ (and I stress should there) detect any RAM problems you have. I'll try this. Barry K. Nathan: Does it always happen when you are moving the mouse over a button or windowbar or some other on-screen object like that? Nope. If anything I'd say it happens during blitting (scrolling, screen refreshing, etc). Also, I'm not overclocking anything. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
As per Russell King's suggestion, I ran memtest86 on my system for about 12 hours last night. I found no memory errors. Note that the tests did not complete because I had to stop them this morning. I'll contiue them tonight. They got through test 9 of 11. As per David Ford's suggestion, I am looking into upgrading to glibc 2.2.1. Can someone please give hints on doing this. I tried to upgrade to 2.2 a few weeks ago and after the 'make install' and then reboot my system was very broken and I had to reinstall the RedHat glibc RPM from CD to recover. I found a howto but it seems pretty old. How do other people do this? I've also done a strace on X. Now what do I do with this 4 MB log file? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
> Would this be an SMP IA32 box with glibc 2.2? I have two such boxen > showing exactly the same behaviour, although I can't reproduce it at will. Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now convinced me to not upgrade glibc yet ;-) --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Is this kernel related (signal 11)?
Hi all, I brought up this issue last month and had some response but as of yet my particular problem still exists. In brief, X windows dies with signal 11. I have done quite a bit of testing and this does not seem to be a hardware issue. Also, I have never managed to get a signal 11 error when not running X. I posted on the X Free86 mailing lists and the consensus there seems to be that it is likely a hardware or kernel problem. So, my question is, how can I pin point the problem? Is this likely to be a kernel issue? Recently I have been able to reproduce the problem reliably in a few ways. First, if I use an app that uses ncurses (like 'make menuconfig' on the Linux kernel) from within Gnome-terminal then X dies instantly. For now I have gone to using only xterm. I can also cause the error from xmms by scrolling the playlist repeatedly. This will happen within a few seconds but not instantly like above. I have also seen the error in other cases but none that I am yet able to reproduce on demand. PLEASE, any suggestions? --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Is this kernel related (signal 11)?
Hi all, I brought up this issue last month and had some response but as of yet my particular problem still exists. In brief, X windows dies with signal 11. I have done quite a bit of testing and this does not seem to be a hardware issue. Also, I have never managed to get a signal 11 error when not running X. I posted on the X Free86 mailing lists and the consensus there seems to be that it is likely a hardware or kernel problem. So, my question is, how can I pin point the problem? Is this likely to be a kernel issue? Recently I have been able to reproduce the problem reliably in a few ways. First, if I use an app that uses ncurses (like 'make menuconfig' on the Linux kernel) from within Gnome-terminal then X dies instantly. For now I have gone to using only xterm. I can also cause the error from xmms by scrolling the playlist repeatedly. This will happen within a few seconds but not instantly like above. I have also seen the error in other cases but none that I am yet able to reproduce on demand. PLEASE, any suggestions? --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Would this be an SMP IA32 box with glibc 2.2? I have two such boxen showing exactly the same behaviour, although I can't reproduce it at will. Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now convinced me to not upgrade glibc yet ;-) --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Oops with 4GB memory setting in 2.4.0 stable
> smb_rename suggests mv, but the process is ls ... er? What commands where > you running on smbfs when it crashed? > > Could this be a symbol mismatch? Keith Owens suggested a less manual way > to get module symbol output. Do you get the same results using that? Here is a newly parsed oops, this time using the /var/log/ksymoops method mentioned by Keith Owens. Does this look better? --Rainer oops.parsed Unable to handle kernel NULL pointer dereference at virtual address printing eip: c01239a4 *pde = Oops: CPU:0 EIP:0010:[] EFLAGS: 00010202 eax: 1001 ebx: ecx: c0256730 edx: 0003f435 esi: c20cde24 edi: ebp: 0001 esp: ee5e3e30 ds: 0018 es: 0018 ss: 0018 Process ls (pid: 449, stackpage=ee5e3000) Stack: c20cde24 ee5e3e64 f7e4 0001 c01262f5 c20cde24 0001 f7e4 c110 fe2f0014 0018 fe2f c20cde24 f88982f8 0001 0070 ee5e3ee8 f889e180 ee61a000 ee6ede9c 0010 f8896e69 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 8b 07 ff 47 18 89 70 04 89 06 89 7e 04 89 37 89 7e 08 8b 44 Segmentation fault
RE: Oops with 4GB memory setting in 2.4.0 stable
smb_rename suggests mv, but the process is ls ... er? What commands where you running on smbfs when it crashed? Could this be a symbol mismatch? Keith Owens suggested a less manual way to get module symbol output. Do you get the same results using that? Here is a newly parsed oops, this time using the /var/log/ksymoops method mentioned by Keith Owens. Does this look better? --Rainer oops.parsed Unable to handle kernel NULL pointer dereference at virtual address printing eip: c01239a4 *pde = Oops: CPU:0 EIP:0010:[c01239a4] EFLAGS: 00010202 eax: 1001 ebx: ecx: c0256730 edx: 0003f435 esi: c20cde24 edi: ebp: 0001 esp: ee5e3e30 ds: 0018 es: 0018 ss: 0018 Process ls (pid: 449, stackpage=ee5e3000) Stack: c20cde24 ee5e3e64 f7e4 0001 c01262f5 c20cde24 0001 f7e4 c110 fe2f0014 0018 fe2f c20cde24 f88982f8 0001 0070 ee5e3ee8 f889e180 ee61a000 ee6ede9c 0010 f8896e69 Call Trace: [c01262f5] [fe2f0014] [fe2f] [f88982f8] [f889e180] [f8896e69] [f8896eaa] [fe2f] [fe2f] [f889e048] [f889e03c] [f8896f40] [fe2f] [f88983b0] [fe2f] [fe2f] [f889798b] [fe2f] [c0140c10] [c0140e7c] [c0140f9e] [c0140e7c] [c0108f4b] Code: 8b 07 ff 47 18 89 70 04 89 06 89 7e 04 89 37 89 7e 08 8b 44 Segmentation fault
RE: Oops with 4GB memory setting in 2.4.0 stable
Hi again, It looks like some progress is being made, *wonderful*, as to some earlier questions... > I'll have a look tonight or so. It works for you on non-bigmem? Yes. Absolutely no problems on non-bigmem. > smb_rename suggests mv, but the process is ls ... er? What commands where > you running on smbfs when it crashed? It seems that ANY access to the smbfs has this affect. Definitely confirmed are: ls, tab completion from bash, cat [some file], and usually df. > > Could this be a symbol mismatch? Keith Owens suggested a less manual way > to get module symbol output. Do you get the same results using that? I'll try to do this and report back. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Oops with 4GB memory setting in 2.4.0 stable
Hi again, It looks like some progress is being made, *wonderful*, as to some earlier questions... I'll have a look tonight or so. It works for you on non-bigmem? Yes. Absolutely no problems on non-bigmem. smb_rename suggests mv, but the process is ls ... er? What commands where you running on smbfs when it crashed? It seems that ANY access to the smbfs has this affect. Definitely confirmed are: ls, tab completion from bash, cat [some file], and usually df. Could this be a symbol mismatch? Keith Owens suggested a less manual way to get module symbol output. Do you get the same results using that? I'll try to do this and report back. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Oops with 4GB memory setting in 2.4.0 stable
Ok, now were making progress. I did as you said and have attached (really!) the new parsed output. Now we have some useful information (I hope). I still got lots of warnings on symbols (which I have edited out of the parsed file for the sake of briefness). What's the next step? --Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti > Sent: Tuesday, January 16, 2001 7:09 AM > To: Rainer Mager > Cc: [EMAIL PROTECTED] > Subject: RE: Oops with 4GB memory setting in 2.4.0 stable > > >>EIP; f889e044<= > Trace; f889d966 > Trace; c0140c10 > Trace; c0140e7c > Trace; c0140f9e > Trace; c0140e7c > > It seems the oops is happening in a module's function. > > You have to make ksymoops parse the oops output against a System.map which > has all modules symbols. Load each module by hand with the insmod -m > option ("insmod -m module.o") and _append_ the outputs to System.map. > > After that you can run ksymoops against this new System.map. oops.parsed.edit
RE: Oops with 4GB memory setting in 2.4.0 stable
I knew that, I was just testing you all. ;-) \e hides his head in shame > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti > Sent: Tuesday, January 16, 2001 6:47 AM > To: Rainer Mager > Cc: [EMAIL PROTECTED] > Subject: Re: Oops with 4GB memory setting in 2.4.0 stable > > > > > On Tue, 16 Jan 2001, Rainer Mager wrote: > > > Attached is my oops.txt and the result sent through > ksymoops. The results > > don't look particularly useful to me so perhaps I'm doing > something wrong. > > PLEASE tell me if I should parse this differently. Likewise, if there is > > anything else I can do to help debug this, please tell me. > > It seems you forgot to attach oops.txt. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ oops.parsed Unable to handle kernel NULL pointer dereference at virtual address printing eip: f889e044 *pde = Oops: 0002 CPU:1 EIP:0010:[] EFLAGS: 00010246 eax: ebx: d5762800 ecx: 0400 edx: c19665fc esi: d55be120 edi: ebp: d5764260 esp: d5505f1c ds: 0018 es: 0018 ss: 0018 Process ls (pid: 865, stackpage=d5505000) Stack: d5762800 d55be120 d5764260 d5764260 d55be120 f889d966 d55be120 d5762800 d5504000 d5764260 fffe fffb d5762800 d5764260 d55be120 d5764260 ba40 0006 c0140c10 d5764260 d5505fb0 c0140e7c Call Trace: [] [] [] [] [] [] Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00 Segmentation fault
Oops with 4GB memory setting in 2.4.0 stable
Hi all, I have a 100% reproducable bug in all of the 2.4.0 kernels including the latest stable one. The issue is that if I compile the kernel to support 4GB RAM (I have 1 GB) and then try to access a samba mount I get an oops. This ALWAYS happens. Usually after this the system is frozen (although the magic SYSREQ still works). If the system isn't frozen then any commands that access the disk will freeze. Fortunately GPM worked and I was able to paste the oops to a file via telnet. Attached is my oops.txt and the result sent through ksymoops. The results don't look particularly useful to me so perhaps I'm doing something wrong. PLEASE tell me if I should parse this differently. Likewise, if there is anything else I can do to help debug this, please tell me. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Oops with 4GB memory setting in 2.4.0 stable
Hi all, I have a 100% reproducable bug in all of the 2.4.0 kernels including the latest stable one. The issue is that if I compile the kernel to support 4GB RAM (I have 1 GB) and then try to access a samba mount I get an oops. This ALWAYS happens. Usually after this the system is frozen (although the magic SYSREQ still works). If the system isn't frozen then any commands that access the disk will freeze. Fortunately GPM worked and I was able to paste the oops to a file via telnet. Attached is my oops.txt and the result sent through ksymoops. The results don't look particularly useful to me so perhaps I'm doing something wrong. PLEASE tell me if I should parse this differently. Likewise, if there is anything else I can do to help debug this, please tell me. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Oops with 4GB memory setting in 2.4.0 stable
I knew that, I was just testing you all. ;-) \e hides his head in shame -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti Sent: Tuesday, January 16, 2001 6:47 AM To: Rainer Mager Cc: [EMAIL PROTECTED] Subject: Re: Oops with 4GB memory setting in 2.4.0 stable On Tue, 16 Jan 2001, Rainer Mager wrote: Attached is my oops.txt and the result sent through ksymoops. The results don't look particularly useful to me so perhaps I'm doing something wrong. PLEASE tell me if I should parse this differently. Likewise, if there is anything else I can do to help debug this, please tell me. It seems you forgot to attach oops.txt. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ oops.parsed Unable to handle kernel NULL pointer dereference at virtual address printing eip: f889e044 *pde = Oops: 0002 CPU:1 EIP:0010:[f889e044] EFLAGS: 00010246 eax: ebx: d5762800 ecx: 0400 edx: c19665fc esi: d55be120 edi: ebp: d5764260 esp: d5505f1c ds: 0018 es: 0018 ss: 0018 Process ls (pid: 865, stackpage=d5505000) Stack: d5762800 d55be120 d5764260 d5764260 d55be120 f889d966 d55be120 d5762800 d5504000 d5764260 fffe fffb d5762800 d5764260 d55be120 d5764260 ba40 0006 c0140c10 d5764260 d5505fb0 c0140e7c Call Trace: [f889d966] [c0140c10] [c0140e7c] [c0140f9e] [c0140e7c] [c0108f4b] Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00 Segmentation fault
RE: Oops with 4GB memory setting in 2.4.0 stable
Ok, now were making progress. I did as you said and have attached (really!) the new parsed output. Now we have some useful information (I hope). I still got lots of warnings on symbols (which I have edited out of the parsed file for the sake of briefness). What's the next step? --Rainer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti Sent: Tuesday, January 16, 2001 7:09 AM To: Rainer Mager Cc: [EMAIL PROTECTED] Subject: RE: Oops with 4GB memory setting in 2.4.0 stable EIP; f889e044 END_OF_CODE+385bfe34/ = Trace; f889d966 END_OF_CODE+385bf756/ Trace; c0140c10 vfs_readdir+90/ec Trace; c0140e7c filldir+0/d8 Trace; c0140f9e sys_getdents+4a/98 Trace; c0140e7c filldir+0/d8 It seems the oops is happening in a module's function. You have to make ksymoops parse the oops output against a System.map which has all modules symbols. Load each module by hand with the insmod -m option ("insmod -m module.o") and _append_ the outputs to System.map. After that you can run ksymoops against this new System.map. oops.parsed.edit
Signal 11 - revisited
I was wondering if anyone had any new info/suggestions for the Signal 11 problem. I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM turned off. This seemed a bit more stable but I did have X crash with Signall 11 after about 1.5 days. I'd really appreciate any advice on how to diagnose this. Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Signal 11 - revisited
I was wondering if anyone had any new info/suggestions for the Signal 11 problem. I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM turned off. This seemed a bit more stable but I did have X crash with Signall 11 after about 1.5 days. I'd really appreciate any advice on how to diagnose this. Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Err, for those of us who aren't up to our elbows in the kernel code, is there a patch for this? Presumeably this will be rolled into 2.4.0test13 but I'd like to try it out? Also, can someone summarize the fix in English along with the expected, improved behavior (e.g. Linux will never have a signal 11 again and will never, ever crash ;-) Finally, as soon as there is a patch, can other people who have seen this problem test it. My problem is so random that I'd need at least a few days to gain some confidence this is fixed. Thanks all. --Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds > Sent: Thursday, December 14, 2000 5:19 AM > To: Mike Galbraith > Cc: Kernel Mailing List > Subject: Re: Signal 11 - the continuing saga > > > On Wed, 13 Dec 2000, Linus Torvalds wrote: > > > > Hint: "ptep_mkdirty()". > > In case you wonder why the bug was so insidious, what this caused was two > separate problems, both of them able to cause SIGSGV's. > > One: we didn't mark the page table entry dirty like we were supposed to. > > Two: by making it writable, we also made the page shared, even if it > wasn't supposed to be shared (so when the next process wrote to the page, > if the swap page was shared with somebody else, the changes would show up > even in the process that _didn't_ write to it). > > And "ptep_mkdirty()" is only used by swapoff, so nothing else would show > this. Which was why it hadn't been immediately obvious that anything was > broken. > > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Mike et al, I have no idea what IKD is and I don't know what to do with any results I might find BUT I'd be happy to do this if it will help. Please pass on the info with the instructions. Who should I report the results to? --Rainer > [mailto:[EMAIL PROTECTED]]On Behalf Of Mike Galbraith > If you want, I can extract IKD.. which happens to have a trap in place > for this (because I have a 100% reproducable swap related SIGSEGV that > I'm trying to figure out). > > If you're interested, let me know and I'll extract it (quite large) and > send it along instructions on how to do the trap. > > -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Give that man a cigarit was an env var (not LOCALE but LANG). I'd actually checked this but I didn't think that made a difference in my case. Thanks Linus, now can you fix the larger signal 11 problem? --Rainer > [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds > I'd guess that the program has a bug, and depending on the arguments and > environment (especially the latter will be different), it shows up or > not. Things like not having a LOCALE set in either case or similar. > > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Mike et al, I have no idea what IKD is and I don't know what to do with any results I might find BUT I'd be happy to do this if it will help. Please pass on the info with the instructions. Who should I report the results to? --Rainer [mailto:[EMAIL PROTECTED]]On Behalf Of Mike Galbraith If you want, I can extract IKD.. which happens to have a trap in place for this (because I have a 100% reproducable swap related SIGSEGV that I'm trying to figure out). If you're interested, let me know and I'll extract it (quite large) and send it along instructions on how to do the trap. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Give that man a cigarit was an env var (not LOCALE but LANG). I'd actually checked this but I didn't think that made a difference in my case. Thanks Linus, now can you fix the larger signal 11 problem? --Rainer [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds I'd guess that the program has a bug, and depending on the arguments and environment (especially the latter will be different), it shows up or not. Things like not having a LOCALE set in either case or similar. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Err, for those of us who aren't up to our elbows in the kernel code, is there a patch for this? Presumeably this will be rolled into 2.4.0test13 but I'd like to try it out? Also, can someone summarize the fix in English along with the expected, improved behavior (e.g. Linux will never have a signal 11 again and will never, ever crash ;-) Finally, as soon as there is a patch, can other people who have seen this problem test it. My problem is so random that I'd need at least a few days to gain some confidence this is fixed. Thanks all. --Rainer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds Sent: Thursday, December 14, 2000 5:19 AM To: Mike Galbraith Cc: Kernel Mailing List Subject: Re: Signal 11 - the continuing saga On Wed, 13 Dec 2000, Linus Torvalds wrote: Hint: "ptep_mkdirty()". In case you wonder why the bug was so insidious, what this caused was two separate problems, both of them able to cause SIGSGV's. One: we didn't mark the page table entry dirty like we were supposed to. Two: by making it writable, we also made the page shared, even if it wasn't supposed to be shared (so when the next process wrote to the page, if the swap page was shared with somebody else, the changes would show up even in the process that _didn't_ write to it). And "ptep_mkdirty()" is only used by swapoff, so nothing else would show this. Which was why it hadn't been immediately obvious that anything was broken. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Thanks for the info... > [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff V. Merkey > > So, is this related to the larger signal 11 problems? > > There's a corruption bug in the page cache somewhere, and it's 100% > reproducable. Finding it will be tough Ok, granted this will be tough but is anyone even actively working on it? What can I do to help? > > Anyone know how to do [disable L1 and L2 caches]? > > Usually this is performed in the BIOS setup. You can also disable L1 > with a sequence of instructions that write to the CR0 register on intel > and flip a bit, but in doing this you have to execute a WBINV (write > back invalidate) instruction to flush out the cache. BIOS setup is > probably simpler. Disabling Level I will make the machine slower > than mollasses, BTW, and if this bug is race related (they always > are) it won't help much in running it down. Aha, just as I suspected. My BIOS doesn't appear to support this. You seem to be saying that doing so won't really contribute anything anyway so I will hold off for now. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Hi again, Ok, I just upgraded to 2.4.0test12 (although I don't think there was any work in 12 that directly addresses this signal 11 problem). When compiling the new kernel I chose to disable AGPGart and RDM as suggested by [EMAIL PROTECTED] I will report later if this makes any difference. On another, possibly related note, I'm getting some really weird behavior with a Java program. The only reason I mention it here is because it dies with our old friend Signal 11. Anyway, please bear with the description below. I have a tiny bash script that launches a Java swing app. If I run my script from an xterm (or gnome-terminal or whatever) then it starts up fine. If, however, I try to launch it from my gnome taskbar's menu then it dies with signal 11 (the Java log is available upon request). This seems to be 100% consistent, since I noticed it yesterday, even across reboots. Interestingly, the same behavior occurs if I try to run the program from withis JBuilder 4. So, is this related to the larger signal 11 problems? What else can I do regarding these issues to help fix it? Would a core dump help anyone? I'd really like to contribute somehow but I need some direction. --Rainer > From: CMA [mailto:[EMAIL PROTECTED]] > Did you already try to selectively disable L1 and L2 caches (if > your box has both) and see what happens? Anyone know how to do this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Hi again, Ok, I just upgraded to 2.4.0test12 (although I don't think there was any work in 12 that directly addresses this signal 11 problem). When compiling the new kernel I chose to disable AGPGart and RDM as suggested by [EMAIL PROTECTED] I will report later if this makes any difference. On another, possibly related note, I'm getting some really weird behavior with a Java program. The only reason I mention it here is because it dies with our old friend Signal 11. Anyway, please bear with the description below. I have a tiny bash script that launches a Java swing app. If I run my script from an xterm (or gnome-terminal or whatever) then it starts up fine. If, however, I try to launch it from my gnome taskbar's menu then it dies with signal 11 (the Java log is available upon request). This seems to be 100% consistent, since I noticed it yesterday, even across reboots. Interestingly, the same behavior occurs if I try to run the program from withis JBuilder 4. So, is this related to the larger signal 11 problems? What else can I do regarding these issues to help fix it? Would a core dump help anyone? I'd really like to contribute somehow but I need some direction. --Rainer From: CMA [mailto:[EMAIL PROTECTED]] Did you already try to selectively disable L1 and L2 caches (if your box has both) and see what happens? Anyone know how to do this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Thanks for the info... [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff V. Merkey So, is this related to the larger signal 11 problems? There's a corruption bug in the page cache somewhere, and it's 100% reproducable. Finding it will be tough Ok, granted this will be tough but is anyone even actively working on it? What can I do to help? Anyone know how to do [disable L1 and L2 caches]? Usually this is performed in the BIOS setup. You can also disable L1 with a sequence of instructions that write to the CR0 register on intel and flip a bit, but in doing this you have to execute a WBINV (write back invalidate) instruction to flush out the cache. BIOS setup is probably simpler. Disabling Level I will make the machine slower than mollasses, BTW, and if this bug is race related (they always are) it won't help much in running it down. Aha, just as I suspected. My BIOS doesn't appear to support this. You seem to be saying that doing so won't really contribute anything anyway so I will hold off for now. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
(This message contains a number of related replies.) > From: Mike Galbraith [mailto:[EMAIL PROTECTED]] > Is init permanently running after you see a couple of these? No, that is, after 23 hours up time it has used only 6 seconds CPU time (according to top). That reminds me that I should repeat that my signal 11 problem has (so far) only caused X to die. The OS remains up and stable. > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > My troublesome box finally seems to be stable.[...]I disabled DRM > & AGPGart. With them both disabled, I get no problems at all. > No Sig11's, No Sig4's, No lockups. > > This box has a Voodoo3 3000 AGP.. I suppose I can try this too. My box has a Matrox G400. BTW, what is DRM? Direct Rendering something? > From: CMA [mailto:[EMAIL PROTECTED]] > Did you already try to selectively disable L1 and L2 caches (if > your box has both) and see what happens? I'll look into this as well. Anyone have any pointers on how to do this? I have a Tyan Tiger 133 with Award BIOS if this helps/matters. Even if this setting does make a difference, what does this tell me/us? I don't consider running the box with disabled cache(s) a viable solution. Thanks all and keep those suggestions coming. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
Well, I just had a Signal 11 even with the patch. What can I do to help figure this out? Thanks, --Rainer -Original Message- From: Alan Cox [mailto:[EMAIL PROTECTED]] Sent: Friday, December 08, 2000 11:07 PM To: David Woodhouse Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich Subject: Re: Signal 11 > > wrong with it. I've only seen this under 2.3.x/2.4 SMP kernels. I > > would say that this is definitely a kernel problem.=20 > > XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34] > kernels - even on my BP6=B9. The random crashes started to happen when = > I > upgraded my distribution=B2 - and are only seen by people using 2.4. So= > I > suspect that it's the combination of glibc and kernel which is triggeri= > ng > it. Have any of the folks seeing it checked if Ben LaHaise's fixes for the page table updating race help ? Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
Well, I just had a Signal 11 even with the patch. What can I do to help figure this out? Thanks, --Rainer -Original Message- From: Alan Cox [mailto:[EMAIL PROTECTED]] Sent: Friday, December 08, 2000 11:07 PM To: David Woodhouse Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich Subject: Re: Signal 11 wrong with it. I've only seen this under 2.3.x/2.4 SMP kernels. I would say that this is definitely a kernel problem.=20 XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34] kernels - even on my BP6=B9. The random crashes started to happen when = I upgraded my distribution=B2 - and are only seen by people using 2.4. So= I suspect that it's the combination of glibc and kernel which is triggeri= ng it. Have any of the folks seeing it checked if Ben LaHaise's fixes for the page table updating race help ? Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
(This message contains a number of related replies.) From: Mike Galbraith [mailto:[EMAIL PROTECTED]] Is init permanently running after you see a couple of these? No, that is, after 23 hours up time it has used only 6 seconds CPU time (according to top). That reminds me that I should repeat that my signal 11 problem has (so far) only caused X to die. The OS remains up and stable. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] My troublesome box finally seems to be stable.[...]I disabled DRM AGPGart. With them both disabled, I get no problems at all. No Sig11's, No Sig4's, No lockups. This box has a Voodoo3 3000 AGP.. I suppose I can try this too. My box has a Matrox G400. BTW, what is DRM? Direct Rendering something? From: CMA [mailto:[EMAIL PROTECTED]] Did you already try to selectively disable L1 and L2 caches (if your box has both) and see what happens? I'll look into this as well. Anyone have any pointers on how to do this? I have a Tyan Tiger 133 with Award BIOS if this helps/matters. Even if this setting does make a difference, what does this tell me/us? I don't consider running the box with disabled cache(s) a viable solution. Thanks all and keep those suggestions coming. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
OOPS when using 4GB memory setting
Hi all, About 1 month back I reported a problem with getting OOPs when running with a kernel compiled with the 4GB memory setting. Since then I've finally managed to get the ksymoops results. Where should I post them? To review: My machine has 1GB RAM. If I build a 2.4.0test11 (or 8, 9, or 10 I haven't tried earlier) kernel and chose the 1GB memory setting then only 900504 K is detected (but everything runs stably). If I chose the 4GB memory setting then the full 1 GB is detected but I get oops. I can reliably force an oops by mounting a samba drive and then accessing it (via ls for example). --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
I just applied the said patch and will report my results. Note that I have never been able to reliably, on-demand reproduce this so give me a few days to see what happens. --Rainer -Original Message- From: Alan Cox [mailto:[EMAIL PROTECTED]] Sent: Friday, December 08, 2000 11:07 PM To: David Woodhouse Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich Subject: Re: Signal 11 > > wrong with it. I've only seen this under 2.3.x/2.4 SMP kernels. I > > would say that this is definitely a kernel problem.=20 > > XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34] > kernels - even on my BP6=B9. The random crashes started to happen when = > I > upgraded my distribution=B2 - and are only seen by people using 2.4. So= > I > suspect that it's the combination of glibc and kernel which is triggeri= > ng > it. Have any of the folks seeing it checked if Ben LaHaise's fixes for the page table updating race help ? Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
I just applied the said patch and will report my results. Note that I have never been able to reliably, on-demand reproduce this so give me a few days to see what happens. --Rainer -Original Message- From: Alan Cox [mailto:[EMAIL PROTECTED]] Sent: Friday, December 08, 2000 11:07 PM To: David Woodhouse Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich Subject: Re: Signal 11 wrong with it. I've only seen this under 2.3.x/2.4 SMP kernels. I would say that this is definitely a kernel problem.=20 XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34] kernels - even on my BP6=B9. The random crashes started to happen when = I upgraded my distribution=B2 - and are only seen by people using 2.4. So= I suspect that it's the combination of glibc and kernel which is triggeri= ng it. Have any of the folks seeing it checked if Ben LaHaise's fixes for the page table updating race help ? Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
OOPS when using 4GB memory setting
Hi all, About 1 month back I reported a problem with getting OOPs when running with a kernel compiled with the 4GB memory setting. Since then I've finally managed to get the ksymoops results. Where should I post them? To review: My machine has 1GB RAM. If I build a 2.4.0test11 (or 8, 9, or 10 I haven't tried earlier) kernel and chose the 1GB memory setting then only 900504 K is detected (but everything runs stably). If I chose the 4GB memory setting then the full 1 GB is detected but I get oops. I can reliably force an oops by mounting a samba drive and then accessing it (via ls for example). --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
Hi all, Thanks for all the input so far. Regarding this... > (I'm not sure exactly what cerberos does, do you have a link for it ?). The official name is "Cerberus Test Control System" aka CTCS. I don't know the official site but a search for this should reveal something. Anyway it is a pretty comprehensive test that includes multiple kernel compiles, memory tests, disk test, etc, etc. Like I said, I ran this for more than 15 hours with no problems. Well, actually, I did notice that if I run CTCS from within X then it freezes up after a few minutes. This appears to happen when/because of extreme swapping. Aside from the above I've also run repeated kernel compiles (more than 50 times) with 'make -j bzImage' and had no problems; all outputs were identical. So given these tests, I'm reasonably confident the core hardware is ok. I suppose it is possible there's some iffy bits in the G400's VRAM (but wouldn't that just result in screen artifacts?). I will admit that I have't yet tried swapping RAM or any other system components. Any other ideas? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Signal 11
Hi all, I've searched around for a answer to this with no real luck yet. If anyone has some ideas I'd be very grateful. I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1. Anyway, about once every 2-3 days X will spontaneously die and the only info I get back is that it was because of signal 11. I've heard that signal 11 can be related to bad hardware, most often memory, but I've done a good bit of testing on this and the system seems ok. What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no errors. Actually this only worked when running from the console. When running from X the machine locked up (although no signal 11). The only info I've gotten back from the XFree86 mailing lists so far is that there are known and wide spread problems with SMP and these types of problems. Can anyone comment on this? Are there known SMP problems? What is the current resolution plan? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Signal 11
Hi all, I've searched around for a answer to this with no real luck yet. If anyone has some ideas I'd be very grateful. I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1. Anyway, about once every 2-3 days X will spontaneously die and the only info I get back is that it was because of signal 11. I've heard that signal 11 can be related to bad hardware, most often memory, but I've done a good bit of testing on this and the system seems ok. What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no errors. Actually this only worked when running from the console. When running from X the machine locked up (although no signal 11). The only info I've gotten back from the XFree86 mailing lists so far is that there are known and wide spread problems with SMP and these types of problems. Can anyone comment on this? Are there known SMP problems? What is the current resolution plan? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11
Hi all, Thanks for all the input so far. Regarding this... (I'm not sure exactly what cerberos does, do you have a link for it ?). The official name is "Cerberus Test Control System" aka CTCS. I don't know the official site but a search for this should reveal something. Anyway it is a pretty comprehensive test that includes multiple kernel compiles, memory tests, disk test, etc, etc. Like I said, I ran this for more than 15 hours with no problems. Well, actually, I did notice that if I run CTCS from within X then it freezes up after a few minutes. This appears to happen when/because of extreme swapping. Aside from the above I've also run repeated kernel compiles (more than 50 times) with 'make -j bzImage' and had no problems; all outputs were identical. So given these tests, I'm reasonably confident the core hardware is ok. I suppose it is possible there's some iffy bits in the G400's VRAM (but wouldn't that just result in screen artifacts?). I will admit that I have't yet tried swapping RAM or any other system components. Any other ideas? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
OOPS when using 4GB memory setting
Hi all, Please respond directly since I'm not on this mailing list. I have 2 intertwined problems that my initial web research has failed to reveal help. I recently upgraded machines and the new one has 1GB RAM. If I build a 2.4.0pre10 (or 8 or 9, I haven't tried earlier) kernel and chose the 1GB memory setting then only 900504 K is detected (but everything runs stably). If I chose the 4GB memory setting then the full 1 G is detected but I get oops. I can reliably force an oops by mounting a samba drive and then accessing it (via ls for example). So, is this a known issue? Should I do an oops analysis? What can I do to fix this? Also 2 items of note. The kernel that comes RetHat 6.2 detects all of the RAM and is stable. Related to this, although not that important, I also noticed that via this RedHad kernel, hdparm shows memory access (not disk) of over 200 MB/s. On my 2.4 kernels this is about 120MB/s. Any ideas why? Second, it is a dual PIII system so is an SMP kernel, if that makes a difference. Any help would be greatly appreciated. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
OOPS when using 4GB memory setting
Hi all, Please respond directly since I'm not on this mailing list. I have 2 intertwined problems that my initial web research has failed to reveal help. I recently upgraded machines and the new one has 1GB RAM. If I build a 2.4.0pre10 (or 8 or 9, I haven't tried earlier) kernel and chose the 1GB memory setting then only 900504 K is detected (but everything runs stably). If I chose the 4GB memory setting then the full 1 G is detected but I get oops. I can reliably force an oops by mounting a samba drive and then accessing it (via ls for example). So, is this a known issue? Should I do an oops analysis? What can I do to fix this? Also 2 items of note. The kernel that comes RetHat 6.2 detects all of the RAM and is stable. Related to this, although not that important, I also noticed that via this RedHad kernel, hdparm shows memory access (not disk) of over 200 MB/s. On my 2.4 kernels this is about 120MB/s. Any ideas why? Second, it is a dual PIII system so is an SMP kernel, if that makes a difference. Any help would be greatly appreciated. --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/