Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hi, you guys were right, I was wrong. It is the hardware. I increased ram voltage by 0.15V on the 22nd and hadn't any oopses since then. And I did torture the system. I am deeply sorry that I wasted your time (but still puzzled that the oopses started after kernel update - maybe I should buy a new psu... ). So it is not reiser4 nor the kernel, just the ram needs a little more 'juice' than the board delivers on 'auto' settings. Glück Auf Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hi, you guys were right, I was wrong. It is the hardware. I increased ram voltage by 0.15V on the 22nd and hadn't any oopses since then. And I did torture the system. I am deeply sorry that I wasted your time (but still puzzled that the oopses started after kernel update - maybe I should buy a new psu... ). So it is not reiser4 nor the kernel, just the ram needs a little more 'juice' than the board delivers on 'auto' settings. Glück Auf Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Freitag, 21. Dezember 2007, Mike Galbraith wrote: > On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote: > > It is just.. I could be the hardware - but I should have seen the > > same 'problem' with earlier kernels - and the 'almost daily oops' only > > started with 2.6.23. > > Nonetheless, the oopsen _suggest_ hardware. If it were my box, I'd move > ram modules as a first step. It costs about two minutes to eliminate > that possibility, but you seem reluctant to take that step. Heck, I'd > _hope_ it's something as simple bad ram, because otherwise, quest for > stability could become a time consuming and/or expensive undertaking... > It costs a little bit more, but it will be part of the 'past holiday special'. As an intermediate step I incresed the voltages of the ram - looks good so far. > If that didn't change anything, I'd go back and stress test a previously > stable configuration to gain confidence in my hardware. you mean like playing ut2004 with reduced fans or several instances of cpuburn, or compiling something big like kdepim with kdeenablefinal? Done all that ... > If 'uhoh, not > as stable as I thought' happened, and nothing is getting obviously hot > [1], I'd pray that it's an electrically noisy power supply, because > that's also easy and cheap. yeah, it would be the least annoying variant. After one PSU ate two computers, this one is just two and a half month old - I had my share of bad PSUs. That is why I increased the voltages. Maybe it helps. > In any case, once I was very very confident > that my hardware was indeed sound, I'd move on to an agonizingly tedious > bisection, with no out of tree modules ever loaded, to narrow down when > this memory corruption that nobody else appears to be hitting appeared. > > -Mike > > 1. Crappy heatsink compound can dry out and fracture, leaving hot chip > under a relatively cool heatsink. This is exactly what I found when I > disassembled my suddenly unstable under heavy load P4 box a while back. still this would show up with the temps. And the temps are ok. Glück Auf, Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
* Hemmann, Volker Armin <[EMAIL PROTECTED]> wrote: > Ok, so after the holidays I will do the following: > > let memtest86+ run several hours. do a full backup to switch > to r3 and build an unpatched kernel. see if I can reproduce the oops > with .21 and .22 (because AFAIR no oops with 21.. but I might be > wrong). > > Not exactly in that order. yeah, that would help. But generally it a big PITA to figure out such bugs where there's no specific 'smoking gun' in the oopses themselves. What usually happens is that people try to figure out a faster way of triggering the bug - and then bisection can be done. But the hardware must be eliminated first as the cause of the bug. Taking out half of the RAM (i know it's painful ...) can help too in isolating RAM problems. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
* Hemmann, Volker Armin [EMAIL PROTECTED] wrote: Ok, so after the holidays I will do the following: let memtest86+ run several hours. do a full backuprestore to switch to r3 and build an unpatched kernel. see if I can reproduce the oops with .21 and .22 (because AFAIR no oops with 21.. but I might be wrong). Not exactly in that order. yeah, that would help. But generally it a big PITA to figure out such bugs where there's no specific 'smoking gun' in the oopses themselves. What usually happens is that people try to figure out a faster way of triggering the bug - and then bisection can be done. But the hardware must be eliminated first as the cause of the bug. Taking out half of the RAM (i know it's painful ...) can help too in isolating RAM problems. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Freitag, 21. Dezember 2007, Mike Galbraith wrote: On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote: It is just.. I could be the hardware - but I should have seen the same 'problem' with earlier kernels - and the 'almost daily oops' only started with 2.6.23. Nonetheless, the oopsen _suggest_ hardware. If it were my box, I'd move ram modules as a first step. It costs about two minutes to eliminate that possibility, but you seem reluctant to take that step. Heck, I'd _hope_ it's something as simple bad ram, because otherwise, quest for stability could become a time consuming and/or expensive undertaking... It costs a little bit more, but it will be part of the 'past holiday special'. As an intermediate step I incresed the voltages of the ram - looks good so far. If that didn't change anything, I'd go back and stress test a previously stable configuration to gain confidence in my hardware. you mean like playing ut2004 with reduced fans or several instances of cpuburn, or compiling something big like kdepim with kdeenablefinal? Done all that ... If 'uhoh, not as stable as I thought' happened, and nothing is getting obviously hot [1], I'd pray that it's an electrically noisy power supply, because that's also easy and cheap. yeah, it would be the least annoying variant. After one PSU ate two computers, this one is just two and a half month old - I had my share of bad PSUs. That is why I increased the voltages. Maybe it helps. In any case, once I was very very confident that my hardware was indeed sound, I'd move on to an agonizingly tedious bisection, with no out of tree modules ever loaded, to narrow down when this memory corruption that nobody else appears to be hitting appeared. -Mike 1. Crappy heatsink compound can dry out and fracture, leaving hot chip under a relatively cool heatsink. This is exactly what I found when I disassembled my suddenly unstable under heavy load P4 box a while back. still this would show up with the temps. And the temps are ok. Glück Auf, Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote: > It is just.. I could be the hardware - but I should have seen the > same 'problem' with earlier kernels - and the 'almost daily oops' only > started with 2.6.23. Nonetheless, the oopsen _suggest_ hardware. If it were my box, I'd move ram modules as a first step. It costs about two minutes to eliminate that possibility, but you seem reluctant to take that step. Heck, I'd _hope_ it's something as simple bad ram, because otherwise, quest for stability could become a time consuming and/or expensive undertaking... If that didn't change anything, I'd go back and stress test a previously stable configuration to gain confidence in my hardware. If 'uhoh, not as stable as I thought' happened, and nothing is getting obviously hot [1], I'd pray that it's an electrically noisy power supply, because that's also easy and cheap. In any case, once I was very very confident that my hardware was indeed sound, I'd move on to an agonizingly tedious bisection, with no out of tree modules ever loaded, to narrow down when this memory corruption that nobody else appears to be hitting appeared. -Mike 1. Crappy heatsink compound can dry out and fracture, leaving hot chip under a relatively cool heatsink. This is exactly what I found when I disassembled my suddenly unstable under heavy load P4 box a while back. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Ok, so after the holidays I will do the following: let memtest86+ run several hours. do a full backup to switch to r3 and build an unpatched kernel. see if I can reproduce the oops with .21 and .22 (because AFAIR no oops with 21.. but I might be wrong). Not exactly in that order. Glück Auf Volker ps: please cc me. I am not subscribed to lkml. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
* Pekka Enberg <[EMAIL PROTECTED]> wrote: > Nah, it's just that vma->anon_vma is probably supposed to be NULL > here. And if you look at all the oopses, they do suggest one > particular byte lane is dodgy (the corruption is in bits 41-43 and > 45). > > The whole thing reminds me of another bug where memtest86 didn't find > anything because it's doing cached memory accesses: > http://lkml.org/lkml/2007/10/3/259 memtest86+ has an uncached test: const struct tseq tseq[] = { {1, 5, 3, 0, 0, "[Address test, walking ones] "}, {1, 6, 3, 2, 0, "[Address test, own address] "}, {1, 0, 3, 14, 0, "[Moving inversions, ones & zeros] "}, {1, 1, 2, 80, 0, "[Moving inversions, 8 bit pattern]"}, {1, 10, 60, 300, 0, "[Moving inversions, random pattern] "}, {1, 7, 64, 66, 0, "[Block move, 64 moves]"}, {1, 2, 2, 320, 0, "[Moving inversions, 32 bit pattern] "}, {1, 9, 40, 120, 0, "[Random number sequence] "}, {1, 3, 4, 240, 0, "[Modulo 20, ones & zeros] "}, {1, 8, 1, 2, 0, "[Bit fade test, 90 min, 2 patterns] "}, {0, 4, 3, 2, 0, "[[Moving inversions, 0 & 1, uncached] "}, {0, 0, 0, 0, 0, NULL} }; find that "Moving inversions, 0 & 1" test and run that one alone, overnight. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hi, On Dec 20, 2007 4:38 PM, David Newall <[EMAIL PROTECTED]> wrote: > >>> and another one, this time tainted with the nvidia module: > >>> 5194.130985] Unable to handle kernel paging request at 0300 > >>> RIP: > > Numbers like that don't suggest hardware faults. All those zeros: It's > far too round. Sounds very like software. In fact, it sounds like the > start of significant hardware region. Nah, it's just that vma->anon_vma is probably supposed to be NULL here. And if you look at all the oopses, they do suggest one particular byte lane is dodgy (the corruption is in bits 41-43 and 45). The whole thing reminds me of another bug where memtest86 didn't find anything because it's doing cached memory accesses: http://lkml.org/lkml/2007/10/3/259 Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, Ingo Molnar wrote: > * Hemmann, Volker Armin <[EMAIL PROTECTED]> wrote: > > On Donnerstag, 20. Dezember 2007, you wrote: > > > Hemmann, Volker Armin wrote: > > > > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P > > > > 2.6.23.11reiser4 #4 > > > > > > The subject line is wrong. > > > You apparently run Linux, but not Linux 2.6.23.y. > > > > first of all, apart from this oops all other oopses I reported were > > with a not-tainted kernel. You might want to read the other mails I > > have sent. > > > > Also, besides of the reiser4 patch there is no other patch added to > > the kernel. And since people have had successfully reported problems > > with heavily distro-patched kernels in the past it looks a little bit > > hypocritical to put my reports aside because of one single patch - > > don't you think? > > reiser4 isnt just a single random patch, it's a huge patch with lots of > interactions with file and memory management. Would it be hard for you > to reproduce the crash without reiser4? (or is all your stuff on > reiser4?) /home (and /var, /tmp) is on reiser4 and my biggest partition. And since it needs up to 3 days to reproduce this - yes, hard to do without r4. Glück Auf, Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hemmann, Volker Armin wrote: > On Donnerstag, 20. Dezember 2007, you wrote: >> The subject line is wrong. >> You apparently run Linux, but not Linux 2.6.23.y. > > first of all, apart from this oops all other oopses I reported were with a > not-tainted kernel. You might want to read the other mails I have sent. > > Also, besides of the reiser4 patch there is no other patch added to the > kernel. And since people have had successfully reported problems with > heavily distro-patched kernels in the past it looks a little bit hypocritical > to put my reports aside because of one single patch - don't you think? I didn't say anything about putting your report aside. For successful reports (as in 'leading to a fix'), it's among else necessary that the issue can be narrowed down enough. Sometimes this is a quick process; e.g. user X finds a very specific driver bug while using a patched and old kernel, driver developer Y takes the time to confirm this bug in a recent mainline kernel because he already had a good idea where to look and how to recreate the respective conditions, and fixes the bug. Sometimes it takes much much more work to identify the circumstances of the bug. It is then necessary that the reporter knows exactly what he is running, simplifies his system to eliminate as many potential causes for problems as possible, and always clearly states under what circumstances the bug happens. If you already found the bug in an untainted (but patched?) kernel, then what information does another report against a tainted kernel add? The tainted kernel has more unknowns than the untainted one. Progress can only be made if the number of unknowns are successively reduced. Regarding other people's reports and hypocrisy and whatnot: I myself am monitoring a few distro bug trackers more or less frequently for bug reports concerning the kernel subsystem I'm interested in. With varying success though. In order make use of a report against a distro kernel, I need to have a good picture of what stuff is in that kernel. Looking at distro bug trackers does only work for me because my field of interest is a driver subsystem which is somewhat decoupled from other kernel parts; so if there is trouble concerning hardware covered by this subsystem, it is usually not too hard to figure out whether the problem is in this subsystem or somewhere else. If it weren't that easy most of the time, I might for example depend on the reporters to test specific mainline kernels or specific development kernels. (Though the latter becomes necessary after all in cases when more targeted debug output is needed from the reporter, or in order to test proposed fixes without having to wait for the distributor to build a test package for the reporter.) -- Stefan Richter -=-=-=== ==-- =-=-- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
* Hemmann, Volker Armin <[EMAIL PROTECTED]> wrote: > On Donnerstag, 20. Dezember 2007, you wrote: > > Hemmann, Volker Armin wrote: > > > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 > > > #4 > > > > The subject line is wrong. > > You apparently run Linux, but not Linux 2.6.23.y. > > first of all, apart from this oops all other oopses I reported were > with a not-tainted kernel. You might want to read the other mails I > have sent. > > Also, besides of the reiser4 patch there is no other patch added to > the kernel. And since people have had successfully reported problems > with heavily distro-patched kernels in the past it looks a little bit > hypocritical to put my reports aside because of one single patch - > don't you think? reiser4 isnt just a single random patch, it's a huge patch with lots of interactions with file and memory management. Would it be hard for you to reproduce the crash without reiser4? (or is all your stuff on reiser4?) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, David Newall wrote: > >>> On Montag, 17. Dezember 2007, you wrote: > >>> > >>> and another one, this time tainted with the nvidia module: > >>> 5194.130985] Unable to handle kernel paging request at 0300 > >>> RIP: > > Numbers like that don't suggest hardware faults. All those zeros: It's > far too round. Sounds very like software. In fact, it sounds like the > start of significant hardware region. And lo! there's a closed-source, > possibly buggy nvidia module. Try another; older or newer are equally > good. and this one was without the nvidia module: http://marc.info/?l=linux-kernel=119790371708690=2 and the first one I reported, was without nvidia and not-tainted too: http://marc.info/?l=linux-kernel=119776365425514=2 I am not a complete idiot. If I have a problem, I try to reproduce without nvidia first (after a clean shutdown and boot, with the module not even on harddisk). And I reproduced it without the module. The last oops with the module was just an example that it does not matter if the module is loaded or not and to (maybe) give some additional information. Glück Auf, Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, you wrote: > On Thu, 2007-12-20 at 06:53 +0100, Hemmann, Volker Armin wrote: > > On Donnerstag, 20. Dezember 2007, you wrote: > > > On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote: > > > > On Montag, 17. Dezember 2007, you wrote: > > > > > > > > and another one, this time tainted with the nvidia module: > > > > 5194.130985] Unable to handle kernel paging request at > > > > 0300 RIP: > > > > > > This really sounds like bad hardware. Either memory or the mobo/riser > > > card the memory is on. You might try lowering the memory timings of > > > your memory in BIOS. Try removing 1/2 of your memory. If it still > > > remove the other 1/2 and put the first 1/2 back and try again. > > > > if this is bad hardware why: > > > > - didn't this show up earlier? > > > > - did a several hour memtest run couple of weeks ago didn't show up > > anything? > > > > - and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc > > finish without any problems? > > Because bad hardware can be highly sensitive to exact load patterns. > Don't be so skeptical of the suggestion that your hardware may be > flakey, in the last 30+ years as a hardware guy in the design lab and in > the field, I've seen very much hardware which passed extensive > diagnostics, but turn out to be flakey nonetheless. > > I would suggest that you rearrange your ram modules, and see if the bit > pattern changes. Memtest may not show a problem with bitflips... if > that's happening. I would also suggest that you check your case > temperature as someone else suggested - lmsensors may say that the CPU > temperature is fine, but that isn't the whole picture. by a long shot. > > -Mike case temp: 25°C measured near a warm harddisk by a digital thermometer. mainboard temp: 31°C measured by lmsensors (mobo bios agrees) cpu temp: 29-50°C (load dependent) measured by lmsensors, bios puts on two additional degrees. I have 4 'big' fans installed to have a constant air stream in the case. This really does not look like overheating. And I did have flaky ram in the past. The thing is, apart from the oops the system is completly, perfectly stable. That really does not smell like flaky hardware. At least not in my experience. Flaky PSU = sudden reboots, boot problems, crashes under load. Flaky mobo = see flaky psu. Flaky Ram: crashes, crashes, more crashes, segfaults here and there, especially when updating glibc, qt or kde. And I don't get this. I only get oopses. It is just.. I could be the hardware - but I should have seen the same 'problem' with earlier kernels - and the 'almost daily oops' only started with 2.6.23. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, you wrote: > Hemmann, Volker Armin wrote: > > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 > > #4 > > The subject line is wrong. > You apparently run Linux, but not Linux 2.6.23.y. first of all, apart from this oops all other oopses I reported were with a not-tainted kernel. You might want to read the other mails I have sent. Also, besides of the reiser4 patch there is no other patch added to the kernel. And since people have had successfully reported problems with heavily distro-patched kernels in the past it looks a little bit hypocritical to put my reports aside because of one single patch - don't you think? Glück Auf, Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hemmann, Volker Armin wrote: > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4 The subject line is wrong. You apparently run Linux, but not Linux 2.6.23.y. -- Stefan Richter -=-=-=== ==-- =-=-- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: Numbers like that don't suggest hardware faults. All those zeros: It's far too round. Sounds very like software. In fact, it sounds like the start of significant hardware region. And lo! there's a closed-source, possibly buggy nvidia module. Try another; older or newer are equally good. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: Numbers like that don't suggest hardware faults. All those zeros: It's far too round. Sounds very like software. In fact, it sounds like the start of significant hardware region. And lo! there's a closed-source, possibly buggy nvidia module. Try another; older or newer are equally good. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hemmann, Volker Armin wrote: [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4 The subject line is wrong. You apparently run Linux, but not Linux 2.6.23.y. -- Stefan Richter -=-=-=== ==-- =-=-- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, you wrote: Hemmann, Volker Armin wrote: [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4 The subject line is wrong. You apparently run Linux, but not Linux 2.6.23.y. first of all, apart from this oops all other oopses I reported were with a not-tainted kernel. You might want to read the other mails I have sent. Also, besides of the reiser4 patch there is no other patch added to the kernel. And since people have had successfully reported problems with heavily distro-patched kernels in the past it looks a little bit hypocritical to put my reports aside because of one single patch - don't you think? Glück Auf, Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, you wrote: On Thu, 2007-12-20 at 06:53 +0100, Hemmann, Volker Armin wrote: On Donnerstag, 20. Dezember 2007, you wrote: On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote: On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: This really sounds like bad hardware. Either memory or the mobo/riser card the memory is on. You might try lowering the memory timings of your memory in BIOS. Try removing 1/2 of your memory. If it still remove the other 1/2 and put the first 1/2 back and try again. if this is bad hardware why: - didn't this show up earlier? - did a several hour memtest run couple of weeks ago didn't show up anything? - and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc finish without any problems? Because bad hardware can be highly sensitive to exact load patterns. Don't be so skeptical of the suggestion that your hardware may be flakey, in the last 30+ years as a hardware guy in the design lab and in the field, I've seen very much hardware which passed extensive diagnostics, but turn out to be flakey nonetheless. I would suggest that you rearrange your ram modules, and see if the bit pattern changes. Memtest may not show a problem with bitflips... if that's happening. I would also suggest that you check your case temperature as someone else suggested - lmsensors may say that the CPU temperature is fine, but that isn't the whole picture. by a long shot. -Mike case temp: 25°C measured near a warm harddisk by a digital thermometer. mainboard temp: 31°C measured by lmsensors (mobo bios agrees) cpu temp: 29-50°C (load dependent) measured by lmsensors, bios puts on two additional degrees. I have 4 'big' fans installed to have a constant air stream in the case. This really does not look like overheating. And I did have flaky ram in the past. The thing is, apart from the oops the system is completly, perfectly stable. That really does not smell like flaky hardware. At least not in my experience. Flaky PSU = sudden reboots, boot problems, crashes under load. Flaky mobo = see flaky psu. Flaky Ram: crashes, crashes, more crashes, segfaults here and there, especially when updating glibc, qt or kde. And I don't get this. I only get oopses. It is just.. I could be the hardware - but I should have seen the same 'problem' with earlier kernels - and the 'almost daily oops' only started with 2.6.23. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, David Newall wrote: On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: Numbers like that don't suggest hardware faults. All those zeros: It's far too round. Sounds very like software. In fact, it sounds like the start of significant hardware region. And lo! there's a closed-source, possibly buggy nvidia module. Try another; older or newer are equally good. and this one was without the nvidia module: http://marc.info/?l=linux-kernelm=119790371708690w=2 and the first one I reported, was without nvidia and not-tainted too: http://marc.info/?l=linux-kernelm=119776365425514w=2 I am not a complete idiot. If I have a problem, I try to reproduce without nvidia first (after a clean shutdown and boot, with the module not even on harddisk). And I reproduced it without the module. The last oops with the module was just an example that it does not matter if the module is loaded or not and to (maybe) give some additional information. Glück Auf, Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
* Hemmann, Volker Armin [EMAIL PROTECTED] wrote: On Donnerstag, 20. Dezember 2007, you wrote: Hemmann, Volker Armin wrote: [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4 The subject line is wrong. You apparently run Linux, but not Linux 2.6.23.y. first of all, apart from this oops all other oopses I reported were with a not-tainted kernel. You might want to read the other mails I have sent. Also, besides of the reiser4 patch there is no other patch added to the kernel. And since people have had successfully reported problems with heavily distro-patched kernels in the past it looks a little bit hypocritical to put my reports aside because of one single patch - don't you think? reiser4 isnt just a single random patch, it's a huge patch with lots of interactions with file and memory management. Would it be hard for you to reproduce the crash without reiser4? (or is all your stuff on reiser4?) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hemmann, Volker Armin wrote: On Donnerstag, 20. Dezember 2007, you wrote: The subject line is wrong. You apparently run Linux, but not Linux 2.6.23.y. first of all, apart from this oops all other oopses I reported were with a not-tainted kernel. You might want to read the other mails I have sent. Also, besides of the reiser4 patch there is no other patch added to the kernel. And since people have had successfully reported problems with heavily distro-patched kernels in the past it looks a little bit hypocritical to put my reports aside because of one single patch - don't you think? I didn't say anything about putting your report aside. For successful reports (as in 'leading to a fix'), it's among else necessary that the issue can be narrowed down enough. Sometimes this is a quick process; e.g. user X finds a very specific driver bug while using a patched and old kernel, driver developer Y takes the time to confirm this bug in a recent mainline kernel because he already had a good idea where to look and how to recreate the respective conditions, and fixes the bug. Sometimes it takes much much more work to identify the circumstances of the bug. It is then necessary that the reporter knows exactly what he is running, simplifies his system to eliminate as many potential causes for problems as possible, and always clearly states under what circumstances the bug happens. If you already found the bug in an untainted (but patched?) kernel, then what information does another report against a tainted kernel add? The tainted kernel has more unknowns than the untainted one. Progress can only be made if the number of unknowns are successively reduced. Regarding other people's reports and hypocrisy and whatnot: I myself am monitoring a few distro bug trackers more or less frequently for bug reports concerning the kernel subsystem I'm interested in. With varying success though. In order make use of a report against a distro kernel, I need to have a good picture of what stuff is in that kernel. Looking at distro bug trackers does only work for me because my field of interest is a driver subsystem which is somewhat decoupled from other kernel parts; so if there is trouble concerning hardware covered by this subsystem, it is usually not too hard to figure out whether the problem is in this subsystem or somewhere else. If it weren't that easy most of the time, I might for example depend on the reporters to test specific mainline kernels or specific development kernels. (Though the latter becomes necessary after all in cases when more targeted debug output is needed from the reporter, or in order to test proposed fixes without having to wait for the distributor to build a test package for the reporter.) -- Stefan Richter -=-=-=== ==-- =-=-- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, Ingo Molnar wrote: * Hemmann, Volker Armin [EMAIL PROTECTED] wrote: On Donnerstag, 20. Dezember 2007, you wrote: Hemmann, Volker Armin wrote: [ 5194.131014] Pid: 22490, comm: sleep Tainted: P 2.6.23.11reiser4 #4 The subject line is wrong. You apparently run Linux, but not Linux 2.6.23.y. first of all, apart from this oops all other oopses I reported were with a not-tainted kernel. You might want to read the other mails I have sent. Also, besides of the reiser4 patch there is no other patch added to the kernel. And since people have had successfully reported problems with heavily distro-patched kernels in the past it looks a little bit hypocritical to put my reports aside because of one single patch - don't you think? reiser4 isnt just a single random patch, it's a huge patch with lots of interactions with file and memory management. Would it be hard for you to reproduce the crash without reiser4? (or is all your stuff on reiser4?) /home (and /var, /tmp) is on reiser4 and my biggest partition. And since it needs up to 3 days to reproduce this - yes, hard to do without r4. Glück Auf, Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hi, On Dec 20, 2007 4:38 PM, David Newall [EMAIL PROTECTED] wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: Numbers like that don't suggest hardware faults. All those zeros: It's far too round. Sounds very like software. In fact, it sounds like the start of significant hardware region. Nah, it's just that vma-anon_vma is probably supposed to be NULL here. And if you look at all the oopses, they do suggest one particular byte lane is dodgy (the corruption is in bits 41-43 and 45). The whole thing reminds me of another bug where memtest86 didn't find anything because it's doing cached memory accesses: http://lkml.org/lkml/2007/10/3/259 Pekka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
* Pekka Enberg [EMAIL PROTECTED] wrote: Nah, it's just that vma-anon_vma is probably supposed to be NULL here. And if you look at all the oopses, they do suggest one particular byte lane is dodgy (the corruption is in bits 41-43 and 45). The whole thing reminds me of another bug where memtest86 didn't find anything because it's doing cached memory accesses: http://lkml.org/lkml/2007/10/3/259 memtest86+ has an uncached test: const struct tseq tseq[] = { {1, 5, 3, 0, 0, [Address test, walking ones] }, {1, 6, 3, 2, 0, [Address test, own address] }, {1, 0, 3, 14, 0, [Moving inversions, ones zeros] }, {1, 1, 2, 80, 0, [Moving inversions, 8 bit pattern]}, {1, 10, 60, 300, 0, [Moving inversions, random pattern] }, {1, 7, 64, 66, 0, [Block move, 64 moves]}, {1, 2, 2, 320, 0, [Moving inversions, 32 bit pattern] }, {1, 9, 40, 120, 0, [Random number sequence] }, {1, 3, 4, 240, 0, [Modulo 20, ones zeros] }, {1, 8, 1, 2, 0, [Bit fade test, 90 min, 2 patterns] }, {0, 4, 3, 2, 0, [[Moving inversions, 0 1, uncached] }, {0, 0, 0, 0, 0, NULL} }; find that Moving inversions, 0 1 test and run that one alone, overnight. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Ok, so after the holidays I will do the following: let memtest86+ run several hours. do a full backuprestore to switch to r3 and build an unpatched kernel. see if I can reproduce the oops with .21 and .22 (because AFAIR no oops with 21.. but I might be wrong). Not exactly in that order. Glück Auf Volker ps: please cc me. I am not subscribed to lkml. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote: It is just.. I could be the hardware - but I should have seen the same 'problem' with earlier kernels - and the 'almost daily oops' only started with 2.6.23. Nonetheless, the oopsen _suggest_ hardware. If it were my box, I'd move ram modules as a first step. It costs about two minutes to eliminate that possibility, but you seem reluctant to take that step. Heck, I'd _hope_ it's something as simple bad ram, because otherwise, quest for stability could become a time consuming and/or expensive undertaking... If that didn't change anything, I'd go back and stress test a previously stable configuration to gain confidence in my hardware. If 'uhoh, not as stable as I thought' happened, and nothing is getting obviously hot [1], I'd pray that it's an electrically noisy power supply, because that's also easy and cheap. In any case, once I was very very confident that my hardware was indeed sound, I'd move on to an agonizingly tedious bisection, with no out of tree modules ever loaded, to narrow down when this memory corruption that nobody else appears to be hitting appeared. -Mike 1. Crappy heatsink compound can dry out and fracture, leaving hot chip under a relatively cool heatsink. This is exactly what I found when I disassembled my suddenly unstable under heavy load P4 box a while back. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, you wrote: > On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote: > > On Montag, 17. Dezember 2007, you wrote: > > > > and another one, this time tainted with the nvidia module: > > 5194.130985] Unable to handle kernel paging request at 0300 > > RIP: > > This really sounds like bad hardware. Either memory or the mobo/riser > card the memory is on. You might try lowering the memory timings of your > memory in BIOS. Try removing 1/2 of your memory. If it still remove the > other 1/2 and put the first 1/2 back and try again. if this is bad hardware why: - didn't this show up earlier? - did a several hour memtest run couple of weeks ago didn't show up anything? - and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc finish without any problems? If it would be bad hardware, I should see segfaults left and right, right? but I don't see them. In fact, apart from the oopses the system works fine - even with the oopses the system works fine, apart from the occasional stuck ps aux And this messages: [41160.823959] kio_http_cache_[25229] general protection rip:32621f1fe9 rsp:7fff59a3d270 error:0 show up on closing konqueror tabs/Konqueror. There are no surprising exits, no apps vanishing. But I will run memtest86+ (or should I use memtest86?). Glück Auf, Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote: > On Montag, 17. Dezember 2007, you wrote: > > and another one, this time tainted with the nvidia module: > 5194.130985] Unable to handle kernel paging request at 0300 RIP: This really sounds like bad hardware. Either memory or the mobo/riser card the memory is on. You might try lowering the memory timings of your memory in BIOS. Try removing 1/2 of your memory. If it still remove the other 1/2 and put the first 1/2 back and try again. -- Scott <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: [ 5194.130988] [] _spin_lock+0x0/0xf [ 5194.130993] PGD 0 [ 5194.130994] Oops: 0002 [1] SMP [ 5194.130996] CPU 1 [ 5194.130997] Modules linked in: rfcomm l2cap hci_usb bluetooth snd_usb_audio ohci1394 snd_usb_lib ieee1394 aic7xxx i2c_nforce2 nvidia(P) k8temp w83627ehf hwmon_vid hwmon i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd r8169 [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4 [ 5194.131015] RIP: 0010:[] [] _spin_lock+0x0/0xf [ 5194.131018] RSP: 0018:81009278be70 EFLAGS: 00010206 [ 5194.131020] RAX: 2ab90bfb5000 RBX: 810117d44db0 RCX: 2ab90bdb5000 [ 5194.131021] RDX: 81011519f810 RSI: 00388aa08fff RDI: 0300 [ 5194.131023] RBP: 0300 R08: 81012f190ea0 R09: [ 5194.131024] R10: 0008 R11: 0246 R12: 810117d44db0 [ 5194.131026] R13: 2ab90bdb R14: R15: [ 5194.131028] FS: 2ab90bde3070() GS:81012fc6cec0() knlGS:f7f756c0 [ 5194.131030] CS: 0010 DS: ES: CR0: 8005003b [ 5194.131031] CR2: 0300 CR3: 93605000 CR4: 06e0 [ 5194.131033] DR0: DR1: DR2: [ 5194.131034] DR3: DR6: 0ff0 DR7: 0400 [ 5194.131036] Process sleep (pid: 22490, threadinfo 81009278a000, task 8100960630c0) [ 5194.131037] Stack: 8026afbc 810117d44db0 810115e2cbb8 810117d44db0 [ 5194.131040] 80265ec3 81009278bee0 81009278bee0 810115e2c3d8 [ 5194.131043] 8100a076cb80 0002 7fff9ecf7808 7fff9ecf7810 [ 5194.131045] Call Trace: [ 5194.131048] [] anon_vma_unlink+0x1a/0x64 [ 5194.131051] [] free_pgtables+0x64/0xc4 [ 5194.131054] [] exit_mmap+0x91/0xeb [ 5194.131057] [] mmput+0x28/0xa0 [ 5194.131060] [] do_exit+0x211/0x786 [ 5194.131063] [] sys_exit_group+0x0/0xe [ 5194.131065] [] system_call+0x7e/0x83 [ 5194.131069] [ 5194.131070] [ 5194.131070] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00 [ 5194.131076] RIP [] _spin_lock+0x0/0xf [ 5194.131078] RSP [ 5194.131079] CR2: 0300 [ 5194.131101] Fixing recursive fault but reboot is needed! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: [ 5194.130988] [804449fa] _spin_lock+0x0/0xf [ 5194.130993] PGD 0 [ 5194.130994] Oops: 0002 [1] SMP [ 5194.130996] CPU 1 [ 5194.130997] Modules linked in: rfcomm l2cap hci_usb bluetooth snd_usb_audio ohci1394 snd_usb_lib ieee1394 aic7xxx i2c_nforce2 nvidia(P) k8temp w83627ehf hwmon_vid hwmon i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd r8169 [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4 [ 5194.131015] RIP: 0010:[804449fa] [804449fa] _spin_lock+0x0/0xf [ 5194.131018] RSP: 0018:81009278be70 EFLAGS: 00010206 [ 5194.131020] RAX: 2ab90bfb5000 RBX: 810117d44db0 RCX: 2ab90bdb5000 [ 5194.131021] RDX: 81011519f810 RSI: 00388aa08fff RDI: 0300 [ 5194.131023] RBP: 0300 R08: 81012f190ea0 R09: [ 5194.131024] R10: 0008 R11: 0246 R12: 810117d44db0 [ 5194.131026] R13: 2ab90bdb R14: R15: [ 5194.131028] FS: 2ab90bde3070() GS:81012fc6cec0() knlGS:f7f756c0 [ 5194.131030] CS: 0010 DS: ES: CR0: 8005003b [ 5194.131031] CR2: 0300 CR3: 93605000 CR4: 06e0 [ 5194.131033] DR0: DR1: DR2: [ 5194.131034] DR3: DR6: 0ff0 DR7: 0400 [ 5194.131036] Process sleep (pid: 22490, threadinfo 81009278a000, task 8100960630c0) [ 5194.131037] Stack: 8026afbc 810117d44db0 810115e2cbb8 810117d44db0 [ 5194.131040] 80265ec3 81009278bee0 81009278bee0 810115e2c3d8 [ 5194.131043] 8100a076cb80 0002 7fff9ecf7808 7fff9ecf7810 [ 5194.131045] Call Trace: [ 5194.131048] [8026afbc] anon_vma_unlink+0x1a/0x64 [ 5194.131051] [80265ec3] free_pgtables+0x64/0xc4 [ 5194.131054] [80267174] exit_mmap+0x91/0xeb [ 5194.131057] [80230191] mmput+0x28/0xa0 [ 5194.131060] [802353db] do_exit+0x211/0x786 [ 5194.131063] [802359cf] sys_exit_group+0x0/0xe [ 5194.131065] [8020b66e] system_call+0x7e/0x83 [ 5194.131069] [ 5194.131070] [ 5194.131070] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00 [ 5194.131076] RIP [804449fa] _spin_lock+0x0/0xf [ 5194.131078] RSP 81009278be70 [ 5194.131079] CR2: 0300 [ 5194.131101] Fixing recursive fault but reboot is needed! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote: On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: This really sounds like bad hardware. Either memory or the mobo/riser card the memory is on. You might try lowering the memory timings of your memory in BIOS. Try removing 1/2 of your memory. If it still remove the other 1/2 and put the first 1/2 back and try again. -- Scott [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Donnerstag, 20. Dezember 2007, you wrote: On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote: On Montag, 17. Dezember 2007, you wrote: and another one, this time tainted with the nvidia module: 5194.130985] Unable to handle kernel paging request at 0300 RIP: This really sounds like bad hardware. Either memory or the mobo/riser card the memory is on. You might try lowering the memory timings of your memory in BIOS. Try removing 1/2 of your memory. If it still remove the other 1/2 and put the first 1/2 back and try again. if this is bad hardware why: - didn't this show up earlier? - did a several hour memtest run couple of weeks ago didn't show up anything? - and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc finish without any problems? If it would be bad hardware, I should see segfaults left and right, right? but I don't see them. In fact, apart from the oopses the system works fine - even with the oopses the system works fine, apart from the occasional stuck ps aux And this messages: [41160.823959] kio_http_cache_[25229] general protection rip:32621f1fe9 rsp:7fff59a3d270 error:0 show up on closing konqueror tabs/Konqueror. There are no surprising exits, no apps vanishing. But I will run memtest86+ (or should I use memtest86?). Glück Auf, Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Montag, 17. Dezember 2007, you wrote: > On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote: > > I got another crash, now with 2.6.23.11 on logout from KDE (two > > differences, new kernel, 4gb ram instead of 2gb): > > also I got some strange message yesterday before increasing ramsize: > > [19546.639528] swap_free: Bad swap offset entry 0400 > > [27999.370777] swap_free: Bad swap offset entry 0400 > > [27999.434282] swap_free: Bad swap offset entry 0400 > > [27999.466035] swap_free: Bad swap offset entry 0400 > > [27999.521132] swap_free: Bad swap offset entry 0400 > > [27999.561621] VM: killing process ld-linux-x86-64 > > [27999.561719] swap_free: Bad swap offset entry 0400 > > You're seeing a single bit set where it shouldn't be: please give > memtest86+ a good try; if it's not actually your memory that's bad, > then I'd guess it's something like overheating (please correct me, > ye who know better). > > Hugh first of all, the 2 with which I was seeing that have had their memtest run for some hours some weeks ago, without problems. I can compile stuff - like the latest kde4 rc without segfaults or problems (except when the oops is happening), and this mess only started recently. To be more correct: the swap-mess only started with 2.6.23.11. With 2.6.23.9 I get the kio_http... rip's, but no swap related messages. Overheating is very unlikely. I made sure that my computer is very well cooled. Even under high load I get something like 50°C from lmsensors and bios - and the errors are completly unrelated to load. Or temperature. Without load my cpu idles at ~30°C. Again, lmsensors and bios are very close about that. Glück Auf, Volker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote: > > I got another crash, now with 2.6.23.11 on logout from KDE (two differences, > new kernel, 4gb ram instead of 2gb): > > [ 1771.063731] Unable to handle kernel paging request at 0400 RIP: > also I got some strange message yesterday before increasing ramsize: > [19546.639528] swap_free: Bad swap offset entry 0400 > [27999.370777] swap_free: Bad swap offset entry 0400 > [27999.434282] swap_free: Bad swap offset entry 0400 > [27999.466035] swap_free: Bad swap offset entry 0400 > [27999.521132] swap_free: Bad swap offset entry 0400 > [27999.561621] VM: killing process ld-linux-x86-64 > [27999.561719] swap_free: Bad swap offset entry 0400 You're seeing a single bit set where it shouldn't be: please give memtest86+ a good try; if it's not actually your memory that's bad, then I'd guess it's something like overheating (please correct me, ye who know better). Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hi. I got another crash, now with 2.6.23.11 on logout from KDE (two differences, new kernel, 4gb ram instead of 2gb): [ 1771.063731] Unable to handle kernel paging request at 0400 RIP: [ 1771.063735] [] _spin_lock+0x0/0xf [ 1771.063740] PGD 0 [ 1771.063741] Oops: 0002 [1] SMP [ 1771.063743] CPU 0 [ 1771.063744] Modules linked in: k8temp w83627ehf hwmon_vid hwmon i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd r8169 [ 1771.063756] Pid: 4418, comm: kdm Not tainted 2.6.23.11reiser4 #1 [ 1771.063758] RIP: 0010:[] [] _spin_lock+0x0/0xf [ 1771.063760] RSP: 0018:81012937de10 EFLAGS: 00010206 [ 1771.063762] RAX: 81012bd78870 RBX: 0400 RCX: [ 1771.063764] RDX: RSI: 81012c549e58 RDI: 0400 [ 1771.063765] RBP: 81012bd78870 R08: 80012c52c045 R09: 0005 [ 1771.063767] R10: 8100050df9f8 R11: 0002 R12: 8101280c3760 [ 1771.063768] R13: 81012f05fac0 R14: 81012c549db0 R15: 81012f05fac0 [ 1771.063770] FS: 2b438009bb40() GS:80533000() knlGS: [ 1771.063772] CS: 0010 DS: ES: CR0: 8005003b [ 1771.063773] CR2: 0400 CR3: 00012d689000 CR4: 06e0 [ 1771.063775] DR0: DR1: DR2: [ 1771.063776] DR3: DR6: 0ff0 DR7: 0400 [ 1771.063778] Process kdm (pid: 4418, threadinfo 81012937c000, task 81012fca7860) [ 1771.063779] Stack: 8026b084 81012ba17138 81012bd78870 [ 1771.063782] 80230d0c 2b438009bbd0 81012937df58 [ 1771.063785] 7fff2aa3d9d0 01200011 8101280c3760 [ 1771.063787] Call Trace: [ 1771.063790] [] anon_vma_link+0x1a/0x40 [ 1771.063793] [] copy_process+0xb03/0x1301 [ 1771.063798] [] do_fork+0xb1/0x1fc [ 1771.063802] [] recalc_sigpending+0xe/0x25 [ 1771.063804] [] system_call+0x7e/0x83 [ 1771.063806] [] ptregscall_common+0x67/0xb0 [ 1771.063810] [ 1771.063811] [ 1771.063811] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00 [ 1771.063816] RIP [] _spin_lock+0x0/0xf [ 1771.063819] RSP [ 1771.063820] CR2: 0400 also I got some strange message yesterday before increasing ramsize: 19546.639528] swap_free: Bad swap offset entry 0400 [19733.026587] kio_http_cache_[9814] general protection rip:3919ff1fe9 rsp:7fff7e1b59f0 error:0 I did swapoff - a, mkswap /dev/sda, swapon -a: [20105.297668] Adding 1951888k swap on /dev/sda2. Priority:-2 extents:1 across:1951888k [21013.797335] kio_http_cache_[10921] general protection rip:3919ff1fe9 rsp:7fff39a6d2a0 error:0 [22381.409172] kio_http_cache_[11459] general protection rip:3919ff1fe9 rsp:7fffd84c4d00 error:0 [23877.759927] kio_http_cache_[11959] general protection rip:3919ff1fe9 rsp:7fff9895c190 error:0 [25080.581142] kio_http_cache_[13146] general protection rip:3919ff1fe9 rsp:7fff790e0920 error:0 [26483.315522] kio_http_cache_[13746] general protection rip:3919ff1fe9 rsp:7fff51933170 error:0 [27696.301584] kio_http_cache_[14417] general protection rip:3919ff1fe9 rsp:7fff8f38abc0 error:0 [27999.370777] swap_free: Bad swap offset entry 0400 [27999.434282] swap_free: Bad swap offset entry 0400 [27999.466035] swap_free: Bad swap offset entry 0400 [27999.521132] swap_free: Bad swap offset entry 0400 [27999.561621] VM: killing process ld-linux-x86-64 [27999.561719] swap_free: Bad swap offset entry 0400 complete dmesg: [0.00] Linux version 2.6.23.11reiser4 ([EMAIL PROTECTED]) (gcc version 4.2.2 (Gentoo 4.2.2 p1.0)) #1 SMP Sun Dec 16 05:14:21 CET 2007 [0.00] Command line: root=/dev/sda3 nmi_watchdog=0 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e6000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - cffb (usable) [0.00] BIOS-e820: cffb - cffc (ACPI data) [0.00] BIOS-e820: cffc - cfff (ACPI NVS) [0.00] BIOS-e820: cfff - d000 (reserved) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: fee0 - fef0 (reserved) [0.00] BIOS-e820: ff38 - 0001 (reserved) [0.00] BIOS-e820: 0001 - 00013000 (usable) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256,
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
Hi. I got another crash, now with 2.6.23.11 on logout from KDE (two differences, new kernel, 4gb ram instead of 2gb): [ 1771.063731] Unable to handle kernel paging request at 0400 RIP: [ 1771.063735] [8044256a] _spin_lock+0x0/0xf [ 1771.063740] PGD 0 [ 1771.063741] Oops: 0002 [1] SMP [ 1771.063743] CPU 0 [ 1771.063744] Modules linked in: k8temp w83627ehf hwmon_vid hwmon i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd r8169 [ 1771.063756] Pid: 4418, comm: kdm Not tainted 2.6.23.11reiser4 #1 [ 1771.063758] RIP: 0010:[8044256a] [8044256a] _spin_lock+0x0/0xf [ 1771.063760] RSP: 0018:81012937de10 EFLAGS: 00010206 [ 1771.063762] RAX: 81012bd78870 RBX: 0400 RCX: [ 1771.063764] RDX: RSI: 81012c549e58 RDI: 0400 [ 1771.063765] RBP: 81012bd78870 R08: 80012c52c045 R09: 0005 [ 1771.063767] R10: 8100050df9f8 R11: 0002 R12: 8101280c3760 [ 1771.063768] R13: 81012f05fac0 R14: 81012c549db0 R15: 81012f05fac0 [ 1771.063770] FS: 2b438009bb40() GS:80533000() knlGS: [ 1771.063772] CS: 0010 DS: ES: CR0: 8005003b [ 1771.063773] CR2: 0400 CR3: 00012d689000 CR4: 06e0 [ 1771.063775] DR0: DR1: DR2: [ 1771.063776] DR3: DR6: 0ff0 DR7: 0400 [ 1771.063778] Process kdm (pid: 4418, threadinfo 81012937c000, task 81012fca7860) [ 1771.063779] Stack: 8026b084 81012ba17138 81012bd78870 [ 1771.063782] 80230d0c 2b438009bbd0 81012937df58 [ 1771.063785] 7fff2aa3d9d0 01200011 8101280c3760 [ 1771.063787] Call Trace: [ 1771.063790] [8026b084] anon_vma_link+0x1a/0x40 [ 1771.063793] [80230d0c] copy_process+0xb03/0x1301 [ 1771.063798] [80231670] do_fork+0xb1/0x1fc [ 1771.063802] [8023aa56] recalc_sigpending+0xe/0x25 [ 1771.063804] [8020b66e] system_call+0x7e/0x83 [ 1771.063806] [8020b987] ptregscall_common+0x67/0xb0 [ 1771.063810] [ 1771.063811] [ 1771.063811] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00 [ 1771.063816] RIP [8044256a] _spin_lock+0x0/0xf [ 1771.063819] RSP 81012937de10 [ 1771.063820] CR2: 0400 also I got some strange message yesterday before increasing ramsize: 19546.639528] swap_free: Bad swap offset entry 0400 [19733.026587] kio_http_cache_[9814] general protection rip:3919ff1fe9 rsp:7fff7e1b59f0 error:0 I did swapoff - a, mkswap /dev/sda, swapon -a: [20105.297668] Adding 1951888k swap on /dev/sda2. Priority:-2 extents:1 across:1951888k [21013.797335] kio_http_cache_[10921] general protection rip:3919ff1fe9 rsp:7fff39a6d2a0 error:0 [22381.409172] kio_http_cache_[11459] general protection rip:3919ff1fe9 rsp:7fffd84c4d00 error:0 [23877.759927] kio_http_cache_[11959] general protection rip:3919ff1fe9 rsp:7fff9895c190 error:0 [25080.581142] kio_http_cache_[13146] general protection rip:3919ff1fe9 rsp:7fff790e0920 error:0 [26483.315522] kio_http_cache_[13746] general protection rip:3919ff1fe9 rsp:7fff51933170 error:0 [27696.301584] kio_http_cache_[14417] general protection rip:3919ff1fe9 rsp:7fff8f38abc0 error:0 [27999.370777] swap_free: Bad swap offset entry 0400 [27999.434282] swap_free: Bad swap offset entry 0400 [27999.466035] swap_free: Bad swap offset entry 0400 [27999.521132] swap_free: Bad swap offset entry 0400 [27999.561621] VM: killing process ld-linux-x86-64 [27999.561719] swap_free: Bad swap offset entry 0400 complete dmesg: [0.00] Linux version 2.6.23.11reiser4 ([EMAIL PROTECTED]) (gcc version 4.2.2 (Gentoo 4.2.2 p1.0)) #1 SMP Sun Dec 16 05:14:21 CET 2007 [0.00] Command line: root=/dev/sda3 nmi_watchdog=0 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e6000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - cffb (usable) [0.00] BIOS-e820: cffb - cffc (ACPI data) [0.00] BIOS-e820: cffc - cfff (ACPI NVS) [0.00] BIOS-e820: cfff - d000 (reserved) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: fee0 - fef0 (reserved) [0.00] BIOS-e820: ff38 - 0001 (reserved) [0.00]
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote: I got another crash, now with 2.6.23.11 on logout from KDE (two differences, new kernel, 4gb ram instead of 2gb): [ 1771.063731] Unable to handle kernel paging request at 0400 RIP: also I got some strange message yesterday before increasing ramsize: [19546.639528] swap_free: Bad swap offset entry 0400 [27999.370777] swap_free: Bad swap offset entry 0400 [27999.434282] swap_free: Bad swap offset entry 0400 [27999.466035] swap_free: Bad swap offset entry 0400 [27999.521132] swap_free: Bad swap offset entry 0400 [27999.561621] VM: killing process ld-linux-x86-64 [27999.561719] swap_free: Bad swap offset entry 0400 You're seeing a single bit set where it shouldn't be: please give memtest86+ a good try; if it's not actually your memory that's bad, then I'd guess it's something like overheating (please correct me, ye who know better). Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well
On Montag, 17. Dezember 2007, you wrote: On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote: I got another crash, now with 2.6.23.11 on logout from KDE (two differences, new kernel, 4gb ram instead of 2gb): also I got some strange message yesterday before increasing ramsize: [19546.639528] swap_free: Bad swap offset entry 0400 [27999.370777] swap_free: Bad swap offset entry 0400 [27999.434282] swap_free: Bad swap offset entry 0400 [27999.466035] swap_free: Bad swap offset entry 0400 [27999.521132] swap_free: Bad swap offset entry 0400 [27999.561621] VM: killing process ld-linux-x86-64 [27999.561719] swap_free: Bad swap offset entry 0400 You're seeing a single bit set where it shouldn't be: please give memtest86+ a good try; if it's not actually your memory that's bad, then I'd guess it's something like overheating (please correct me, ye who know better). Hugh first of all, the 2 with which I was seeing that have had their memtest run for some hours some weeks ago, without problems. I can compile stuff - like the latest kde4 rc without segfaults or problems (except when the oops is happening), and this mess only started recently. To be more correct: the swap-mess only started with 2.6.23.11. With 2.6.23.9 I get the kio_http... rip's, but no swap related messages. Overheating is very unlikely. I made sure that my computer is very well cooled. Even under high load I get something like 50°C from lmsensors and bios - and the errors are completly unrelated to load. Or temperature. Without load my cpu idles at ~30°C. Again, lmsensors and bios are very close about that. Glück Auf, Volker -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/