subject:"almost daily Kernel oops with 2.6.23.9 \- and now 2.6.23.11 as well"

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-29 Thread Hemmann, Volker Armin

Hi,

you guys were right, I was wrong.

It is the hardware. 

I increased ram voltage by 0.15V on the 22nd and hadn't any oopses since then. 
And I did torture the system.

I am deeply sorry that I wasted your time (but still puzzled that the oopses 
started after kernel update - maybe I should buy a new psu... ).

So it is not reiser4  nor the kernel, just the ram needs a little more 'juice' 
than the board delivers on 'auto' settings.

Glück Auf
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-29 Thread Hemmann, Volker Armin

Hi,

you guys were right, I was wrong.

It is the hardware. 

I increased ram voltage by 0.15V on the 22nd and hadn't any oopses since then. 
And I did torture the system.

I am deeply sorry that I wasted your time (but still puzzled that the oopses 
started after kernel update - maybe I should buy a new psu... ).

So it is not reiser4  nor the kernel, just the ram needs a little more 'juice' 
than the board delivers on 'auto' settings.

Glück Auf
Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-21 Thread Hemmann, Volker Armin

On Freitag, 21. Dezember 2007, Mike Galbraith wrote:
> On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote:
> > It is just.. I could be the hardware - but I should have seen the
> > same 'problem' with earlier kernels - and the 'almost daily oops' only
> > started with 2.6.23.
>
> Nonetheless, the oopsen _suggest_ hardware.  If it were my box, I'd move
> ram modules as a first step.  It costs about two minutes to eliminate
> that possibility, but you seem reluctant to take that step.  Heck, I'd
> _hope_ it's something as simple bad ram, because otherwise, quest for
> stability could become a time consuming and/or expensive undertaking...
>

It costs a little bit more, but it will be part of the 'past holiday special'. 
As an intermediate step I incresed the voltages of the ram - looks good so 
far.

> If that didn't change anything, I'd go back and stress test a previously
> stable configuration to gain confidence in my hardware. 

you mean like playing ut2004 with reduced fans or several instances of 
cpuburn, or compiling something big like kdepim with kdeenablefinal? Done all 
that ...

> If 'uhoh, not 
> as stable as I thought' happened, and nothing is getting obviously hot
> [1], I'd pray that it's an electrically noisy power supply, because
> that's also easy and cheap. 

yeah, it would be the least annoying variant. After one PSU ate two computers, 
this one is just two and a half month old  -  I had my share of bad PSUs. 
That is why I increased the voltages. Maybe it helps.

> In any case, once I was very very confident 
> that my hardware was indeed sound, I'd move on to an agonizingly tedious
> bisection, with no out of tree modules ever loaded, to narrow down when
> this memory corruption that nobody else appears to be hitting appeared.
>
>   -Mike
>
> 1.  Crappy heatsink compound can dry out and fracture, leaving hot chip
> under a relatively cool heatsink.  This is exactly what I found when I
> disassembled my suddenly unstable under heavy load P4 box a while back.

still this would show up with the temps. And the temps are ok.

Glück Auf,
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-21 Thread Ingo Molnar

* Hemmann, Volker Armin <[EMAIL PROTECTED]> wrote:

> Ok, so after the holidays I will do the following:
> 
> let memtest86+ run several hours. do a full backup to switch 
> to r3 and build an unpatched kernel. see if I can reproduce the oops 
> with .21 and .22 (because AFAIR no oops with 21.. but I might be 
> wrong).
> 
> Not exactly in that order.

yeah, that would help. But generally it a big PITA to figure out such 
bugs where there's no specific 'smoking gun' in the oopses themselves. 
What usually happens is that people try to figure out a faster way of 
triggering the bug - and then bisection can be done. But the hardware 
must be eliminated first as the cause of the bug. Taking out half of the 
RAM (i know it's painful ...) can help too in isolating RAM problems.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-21 Thread Ingo Molnar


* Hemmann, Volker Armin [EMAIL PROTECTED] wrote:

 Ok, so after the holidays I will do the following:
 
 let memtest86+ run several hours. do a full backuprestore to switch 
 to r3 and build an unpatched kernel. see if I can reproduce the oops 
 with .21 and .22 (because AFAIR no oops with 21.. but I might be 
 wrong).
 
 Not exactly in that order.

yeah, that would help. But generally it a big PITA to figure out such 
bugs where there's no specific 'smoking gun' in the oopses themselves. 
What usually happens is that people try to figure out a faster way of 
triggering the bug - and then bisection can be done. But the hardware 
must be eliminated first as the cause of the bug. Taking out half of the 
RAM (i know it's painful ...) can help too in isolating RAM problems.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-21 Thread Hemmann, Volker Armin

On Freitag, 21. Dezember 2007, Mike Galbraith wrote:
 On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote:
  It is just.. I could be the hardware - but I should have seen the
  same 'problem' with earlier kernels - and the 'almost daily oops' only
  started with 2.6.23.

 Nonetheless, the oopsen _suggest_ hardware.  If it were my box, I'd move
 ram modules as a first step.  It costs about two minutes to eliminate
 that possibility, but you seem reluctant to take that step.  Heck, I'd
 _hope_ it's something as simple bad ram, because otherwise, quest for
 stability could become a time consuming and/or expensive undertaking...


It costs a little bit more, but it will be part of the 'past holiday special'. 
As an intermediate step I incresed the voltages of the ram - looks good so 
far.

 If that didn't change anything, I'd go back and stress test a previously
 stable configuration to gain confidence in my hardware. 

you mean like playing ut2004 with reduced fans or several instances of 
cpuburn, or compiling something big like kdepim with kdeenablefinal? Done all 
that ...

 If 'uhoh, not 
 as stable as I thought' happened, and nothing is getting obviously hot
 [1], I'd pray that it's an electrically noisy power supply, because
 that's also easy and cheap. 

yeah, it would be the least annoying variant. After one PSU ate two computers, 
this one is just two and a half month old  -  I had my share of bad PSUs. 
That is why I increased the voltages. Maybe it helps.

 In any case, once I was very very confident 
 that my hardware was indeed sound, I'd move on to an agonizingly tedious
 bisection, with no out of tree modules ever loaded, to narrow down when
 this memory corruption that nobody else appears to be hitting appeared.

   -Mike

 1.  Crappy heatsink compound can dry out and fracture, leaving hot chip
 under a relatively cool heatsink.  This is exactly what I found when I
 disassembled my suddenly unstable under heavy load P4 box a while back.

still this would show up with the temps. And the temps are ok.

Glück Auf,
Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Mike Galbraith

On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote:

> It is just.. I could be the hardware - but I should have seen the 
> same 'problem' with earlier kernels - and the 'almost daily oops' only 
> started with 2.6.23.

Nonetheless, the oopsen _suggest_ hardware.  If it were my box, I'd move
ram modules as a first step.  It costs about two minutes to eliminate
that possibility, but you seem reluctant to take that step.  Heck, I'd
_hope_ it's something as simple bad ram, because otherwise, quest for
stability could become a time consuming and/or expensive undertaking...

If that didn't change anything, I'd go back and stress test a previously
stable configuration to gain confidence in my hardware.  If 'uhoh, not
as stable as I thought' happened, and nothing is getting obviously hot
[1], I'd pray that it's an electrically noisy power supply, because
that's also easy and cheap.  In any case, once I was very very confident
that my hardware was indeed sound, I'd move on to an agonizingly tedious
bisection, with no out of tree modules ever loaded, to narrow down when
this memory corruption that nobody else appears to be hitting appeared.

-Mike

1.  Crappy heatsink compound can dry out and fracture, leaving hot chip
under a relatively cool heatsink.  This is exactly what I found when I
disassembled my suddenly unstable under heavy load P4 box a while back.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

Ok, so after the holidays I will do the following:

let memtest86+ run several hours.
do a full backup to switch to r3 and build an unpatched kernel.
see if I can reproduce the oops with .21 and .22 (because AFAIR no oops with 
21.. but I might be wrong).

Not exactly in that order.

Glück Auf
Volker


ps: please cc me. I am not subscribed to lkml.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Ingo Molnar


* Pekka Enberg <[EMAIL PROTECTED]> wrote:

> Nah, it's just that vma->anon_vma is probably supposed to be NULL 
> here. And if you look at all the oopses, they do suggest one 
> particular byte lane is dodgy (the corruption is in bits 41-43 and 
> 45).
> 
> The whole thing reminds me of another bug where memtest86 didn't find 
> anything because it's doing cached memory accesses: 
> http://lkml.org/lkml/2007/10/3/259

memtest86+ has an uncached test:

const struct tseq tseq[] = {
{1,  5,  3,   0, 0, "[Address test, walking ones]  "},
{1,  6,  3,   2, 0, "[Address test, own address]   "},
{1,  0,  3,  14, 0, "[Moving inversions, ones & zeros] "},
{1,  1,  2,  80, 0, "[Moving inversions, 8 bit pattern]"},
{1, 10, 60, 300, 0, "[Moving inversions, random pattern]   "},
{1,  7, 64,  66, 0, "[Block move, 64 moves]"},
{1,  2,  2, 320, 0, "[Moving inversions, 32 bit pattern]   "},
{1,  9, 40, 120, 0, "[Random number sequence]  "},
{1,  3,  4, 240, 0, "[Modulo 20, ones & zeros] "},
{1,  8,  1,   2, 0, "[Bit fade test, 90 min, 2 patterns]   "},
{0,  4,  3,   2, 0, "[[Moving inversions, 0 & 1, uncached] "},
{0,  0,  0,   0, 0, NULL}
};

find that "Moving inversions, 0 & 1" test and run that one alone, 
overnight.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Pekka Enberg

Hi,

On Dec 20, 2007 4:38 PM, David Newall <[EMAIL PROTECTED]> wrote:
> >>> and another one, this time tainted with the nvidia module:
> >>> 5194.130985] Unable to handle kernel paging request at 0300
> >>> RIP:
>
> Numbers like that don't suggest hardware faults.  All those zeros: It's
> far too round.  Sounds very like software.  In fact, it sounds like the
> start of significant hardware region.

Nah, it's just that vma->anon_vma is probably supposed to be NULL here. And if
you look at all the oopses, they do suggest one particular byte lane
is dodgy (the
corruption is in bits 41-43 and 45).

The whole thing reminds me of another bug where memtest86 didn't find anything
because it's doing cached memory accesses: http://lkml.org/lkml/2007/10/3/259

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, Ingo Molnar wrote:
> * Hemmann, Volker Armin <[EMAIL PROTECTED]> wrote:
> > On Donnerstag, 20. Dezember 2007, you wrote:
> > > Hemmann, Volker Armin wrote:
> > > > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P   
> > > > 2.6.23.11reiser4 #4
> > >
> > > The subject line is wrong.
> > > You apparently run Linux, but not Linux 2.6.23.y.
> >
> > first of all, apart from this oops all other oopses I reported were
> > with a not-tainted kernel. You might want to read the other mails I
> > have sent.
> >
> > Also, besides of the reiser4 patch there is no other patch added to
> > the kernel. And since people have had successfully reported problems
> > with heavily distro-patched kernels in the past it looks a little bit
> > hypocritical to put my reports aside because of one single patch -
> > don't you think?
>
> reiser4 isnt just a single random patch, it's a huge patch with lots of
> interactions with file and memory management. Would it be hard for you
> to reproduce the crash without reiser4? (or is all your stuff on
> reiser4?)

/home (and /var, /tmp) is on reiser4 and my biggest partition. And since it 
needs up to 3 days to reproduce this - yes, hard to do without r4.

Glück Auf,
 Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Stefan Richter

Hemmann, Volker Armin wrote:
> On Donnerstag, 20. Dezember 2007, you wrote:
>> The subject line is wrong.
>> You apparently run Linux, but not Linux 2.6.23.y.
> 
> first of all, apart from this oops all other oopses I reported were with a 
> not-tainted kernel. You might want to read the other mails I have sent.
> 
> Also, besides of the reiser4 patch there is no other patch added to the 
> kernel. And since people have had  successfully reported problems with 
> heavily distro-patched kernels in the past it looks a little bit hypocritical 
> to put my reports aside because of one single patch - don't you think?

I didn't say anything about putting your report aside.

For successful reports (as in 'leading to a fix'), it's among else
necessary that the issue can be narrowed down enough.  Sometimes this is
a quick process; e.g. user X finds a very specific driver bug while
using a patched and old kernel, driver developer Y takes the time to
confirm this bug in a recent mainline kernel because he already had a
good idea where to look and how to recreate the respective conditions,
and fixes the bug.  Sometimes it takes much much more work to identify
the circumstances of the bug.  It is then necessary that the reporter
knows exactly what he is running, simplifies his system to eliminate as
many potential causes for problems as possible, and always clearly
states under what circumstances the bug happens.

If you already found the bug in an untainted (but patched?) kernel, then
what information does another report against a tainted kernel add?  The
tainted kernel has more unknowns than the untainted one.  Progress can
only be made if the number of unknowns are successively reduced.

Regarding other people's reports and hypocrisy and whatnot:  I myself am
monitoring a few distro bug trackers more or less frequently for bug
reports concerning the kernel subsystem I'm interested in.  With varying
success though.  In order make use of a report against a distro kernel,
I need to have a good picture of what stuff is in that kernel.  Looking
at distro bug trackers does only work for me because my field of
interest is a driver subsystem which is somewhat decoupled from other
kernel parts; so if there is trouble concerning hardware covered by this
subsystem, it is usually not too hard to figure out whether the problem
is in this subsystem or somewhere else.  If it weren't that easy most of
the time, I might for example depend on the reporters to test specific
mainline kernels or specific development kernels.  (Though the latter
becomes necessary after all in cases when more targeted debug output is
needed from the reporter, or in order to test proposed fixes without
having to wait for the distributor to build a test package for the
reporter.)
-- 
Stefan Richter
-=-=-=== ==-- =-=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Ingo Molnar


* Hemmann, Volker Armin <[EMAIL PROTECTED]> wrote:

> On Donnerstag, 20. Dezember 2007, you wrote:
> > Hemmann, Volker Armin wrote:
> > > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4
> > > #4
> >
> > The subject line is wrong.
> > You apparently run Linux, but not Linux 2.6.23.y.
> 
> first of all, apart from this oops all other oopses I reported were 
> with a not-tainted kernel. You might want to read the other mails I 
> have sent.
> 
> Also, besides of the reiser4 patch there is no other patch added to 
> the kernel. And since people have had successfully reported problems 
> with heavily distro-patched kernels in the past it looks a little bit 
> hypocritical to put my reports aside because of one single patch - 
> don't you think?

reiser4 isnt just a single random patch, it's a huge patch with lots of 
interactions with file and memory management. Would it be hard for you 
to reproduce the crash without reiser4? (or is all your stuff on 
reiser4?)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, David Newall wrote:
> >>> On Montag, 17. Dezember 2007, you wrote:
> >>>
> >>> and another one, this time tainted with the nvidia module:
> >>> 5194.130985] Unable to handle kernel paging request at 0300
> >>> RIP:
>
> Numbers like that don't suggest hardware faults.  All those zeros: It's
> far too round.  Sounds very like software.  In fact, it sounds like the
> start of significant hardware region.   And lo! there's a closed-source,
> possibly buggy nvidia module.  Try another; older or newer are equally
> good.

and this one was without the nvidia module:
http://marc.info/?l=linux-kernel=119790371708690=2

and the first one I reported, was without nvidia and not-tainted too:
http://marc.info/?l=linux-kernel=119776365425514=2

I am not a complete idiot. If I have a problem, I try to reproduce without 
nvidia first (after a clean shutdown and boot, with the module not even on 
harddisk). And I reproduced it without the module. The last oops with the 
module was just an example that it does not matter if the module is loaded or 
not and to (maybe) give some additional information.

Glück Auf,
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, you wrote:
> On Thu, 2007-12-20 at 06:53 +0100, Hemmann, Volker Armin wrote:
> > On Donnerstag, 20. Dezember 2007, you wrote:
> > > On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote:
> > > > On Montag, 17. Dezember 2007, you wrote:
> > > >
> > > > and another one, this time tainted with the nvidia module:
> > > > 5194.130985] Unable to handle kernel paging request at
> > > > 0300 RIP:
> > >
> > > This really sounds like bad hardware. Either memory or the mobo/riser
> > > card the memory is on. You might try lowering the memory timings of
> > > your memory in BIOS. Try removing 1/2 of your memory. If it still
> > > remove the other 1/2 and put the first 1/2 back and try again.
> >
> > if this is bad hardware why:
> >
> > - didn't this show up earlier?
> >
> > - did a several hour memtest run couple of weeks ago didn't show up
> > anything?
> >
> > - and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc
> > finish without any problems?
>
> Because bad hardware can be highly sensitive to exact load patterns.
> Don't be so skeptical of the suggestion that your hardware may be
> flakey, in the last 30+ years as a hardware guy in the design lab and in
> the field, I've seen very much hardware which passed extensive
> diagnostics, but turn out to be flakey nonetheless.
>
> I would suggest that you rearrange your ram modules, and see if the bit
> pattern changes.  Memtest may not show a problem with bitflips... if
> that's happening.  I would also suggest that you check your case
> temperature as someone else suggested - lmsensors may say that the CPU
> temperature is fine, but that isn't the whole picture. by a long shot.
>
>   -Mike

case temp: 25°C measured near a warm harddisk by a digital thermometer.
mainboard temp: 31°C measured by lmsensors (mobo bios agrees)
cpu temp: 29-50°C (load dependent) measured by lmsensors, bios puts on two 
additional degrees.

I have 4 'big' fans installed to have a constant air stream in the case.

This really does not look like overheating. And I did have flaky ram in the 
past. The thing is, apart from the oops the system is completly, perfectly 
stable. That really does not smell like flaky hardware. At least not in my 
experience. Flaky PSU = sudden reboots, boot problems, crashes under load. 
Flaky mobo = see flaky psu. Flaky Ram: crashes, crashes, more crashes, 
segfaults here and there, especially when updating glibc, qt or kde. 

And I don't get this. I only get oopses.

It is just.. I could be the hardware - but I should have seen the 
same 'problem' with earlier kernels - and the 'almost daily oops' only 
started with 2.6.23.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, you wrote:
> Hemmann, Volker Armin wrote:
> > [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4
> > #4
>
> The subject line is wrong.
> You apparently run Linux, but not Linux 2.6.23.y.

first of all, apart from this oops all other oopses I reported were with a 
not-tainted kernel. You might want to read the other mails I have sent.

Also, besides of the reiser4 patch there is no other patch added to the 
kernel. And since people have had  successfully reported problems with 
heavily distro-patched kernels in the past it looks a little bit hypocritical 
to put my reports aside because of one single patch - don't you think?

Glück Auf,
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Stefan Richter

Hemmann, Volker Armin wrote:
> [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4

The subject line is wrong.
You apparently run Linux, but not Linux 2.6.23.y.
-- 
Stefan Richter
-=-=-=== ==-- =-=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread David Newall




On Montag, 17. Dezember 2007, you wrote:

and another one, this time tainted with the nvidia module:
5194.130985] Unable to handle kernel paging request at 0300
RIP:


Numbers like that don't suggest hardware faults.  All those zeros: It's 
far too round.  Sounds very like software.  In fact, it sounds like the 
start of significant hardware region.   And lo! there's a closed-source, 
possibly buggy nvidia module.  Try another; older or newer are equally good.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread David Newall




On Montag, 17. Dezember 2007, you wrote:

and another one, this time tainted with the nvidia module:
5194.130985] Unable to handle kernel paging request at 0300
RIP:


Numbers like that don't suggest hardware faults.  All those zeros: It's 
far too round.  Sounds very like software.  In fact, it sounds like the 
start of significant hardware region.   And lo! there's a closed-source, 
possibly buggy nvidia module.  Try another; older or newer are equally good.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Stefan Richter

Hemmann, Volker Armin wrote:
 [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4

The subject line is wrong.
You apparently run Linux, but not Linux 2.6.23.y.
-- 
Stefan Richter
-=-=-=== ==-- =-=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, you wrote:
 Hemmann, Volker Armin wrote:
  [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4
  #4

 The subject line is wrong.
 You apparently run Linux, but not Linux 2.6.23.y.

first of all, apart from this oops all other oopses I reported were with a 
not-tainted kernel. You might want to read the other mails I have sent.

Also, besides of the reiser4 patch there is no other patch added to the 
kernel. And since people have had  successfully reported problems with 
heavily distro-patched kernels in the past it looks a little bit hypocritical 
to put my reports aside because of one single patch - don't you think?

Glück Auf,
Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, you wrote:
 On Thu, 2007-12-20 at 06:53 +0100, Hemmann, Volker Armin wrote:
  On Donnerstag, 20. Dezember 2007, you wrote:
   On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote:
On Montag, 17. Dezember 2007, you wrote:
   
and another one, this time tainted with the nvidia module:
5194.130985] Unable to handle kernel paging request at
0300 RIP:
  
   This really sounds like bad hardware. Either memory or the mobo/riser
   card the memory is on. You might try lowering the memory timings of
   your memory in BIOS. Try removing 1/2 of your memory. If it still
   remove the other 1/2 and put the first 1/2 back and try again.
 
  if this is bad hardware why:
 
  - didn't this show up earlier?
 
  - did a several hour memtest run couple of weeks ago didn't show up
  anything?
 
  - and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc
  finish without any problems?

 Because bad hardware can be highly sensitive to exact load patterns.
 Don't be so skeptical of the suggestion that your hardware may be
 flakey, in the last 30+ years as a hardware guy in the design lab and in
 the field, I've seen very much hardware which passed extensive
 diagnostics, but turn out to be flakey nonetheless.

 I would suggest that you rearrange your ram modules, and see if the bit
 pattern changes.  Memtest may not show a problem with bitflips... if
 that's happening.  I would also suggest that you check your case
 temperature as someone else suggested - lmsensors may say that the CPU
 temperature is fine, but that isn't the whole picture. by a long shot.

   -Mike

case temp: 25°C measured near a warm harddisk by a digital thermometer.
mainboard temp: 31°C measured by lmsensors (mobo bios agrees)
cpu temp: 29-50°C (load dependent) measured by lmsensors, bios puts on two 
additional degrees.

I have 4 'big' fans installed to have a constant air stream in the case.

This really does not look like overheating. And I did have flaky ram in the 
past. The thing is, apart from the oops the system is completly, perfectly 
stable. That really does not smell like flaky hardware. At least not in my 
experience. Flaky PSU = sudden reboots, boot problems, crashes under load. 
Flaky mobo = see flaky psu. Flaky Ram: crashes, crashes, more crashes, 
segfaults here and there, especially when updating glibc, qt or kde. 

And I don't get this. I only get oopses.

It is just.. I could be the hardware - but I should have seen the 
same 'problem' with earlier kernels - and the 'almost daily oops' only 
started with 2.6.23.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, David Newall wrote:
  On Montag, 17. Dezember 2007, you wrote:
 
  and another one, this time tainted with the nvidia module:
  5194.130985] Unable to handle kernel paging request at 0300
  RIP:

 Numbers like that don't suggest hardware faults.  All those zeros: It's
 far too round.  Sounds very like software.  In fact, it sounds like the
 start of significant hardware region.   And lo! there's a closed-source,
 possibly buggy nvidia module.  Try another; older or newer are equally
 good.

and this one was without the nvidia module:
http://marc.info/?l=linux-kernelm=119790371708690w=2

and the first one I reported, was without nvidia and not-tainted too:
http://marc.info/?l=linux-kernelm=119776365425514w=2

I am not a complete idiot. If I have a problem, I try to reproduce without 
nvidia first (after a clean shutdown and boot, with the module not even on 
harddisk). And I reproduced it without the module. The last oops with the 
module was just an example that it does not matter if the module is loaded or 
not and to (maybe) give some additional information.

Glück Auf,
Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Ingo Molnar


* Hemmann, Volker Armin [EMAIL PROTECTED] wrote:

 On Donnerstag, 20. Dezember 2007, you wrote:
  Hemmann, Volker Armin wrote:
   [ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4
   #4
 
  The subject line is wrong.
  You apparently run Linux, but not Linux 2.6.23.y.
 
 first of all, apart from this oops all other oopses I reported were 
 with a not-tainted kernel. You might want to read the other mails I 
 have sent.
 
 Also, besides of the reiser4 patch there is no other patch added to 
 the kernel. And since people have had successfully reported problems 
 with heavily distro-patched kernels in the past it looks a little bit 
 hypocritical to put my reports aside because of one single patch - 
 don't you think?

reiser4 isnt just a single random patch, it's a huge patch with lots of 
interactions with file and memory management. Would it be hard for you 
to reproduce the crash without reiser4? (or is all your stuff on 
reiser4?)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Stefan Richter

Hemmann, Volker Armin wrote:
 On Donnerstag, 20. Dezember 2007, you wrote:
 The subject line is wrong.
 You apparently run Linux, but not Linux 2.6.23.y.
 
 first of all, apart from this oops all other oopses I reported were with a 
 not-tainted kernel. You might want to read the other mails I have sent.
 
 Also, besides of the reiser4 patch there is no other patch added to the 
 kernel. And since people have had  successfully reported problems with 
 heavily distro-patched kernels in the past it looks a little bit hypocritical 
 to put my reports aside because of one single patch - don't you think?

I didn't say anything about putting your report aside.

For successful reports (as in 'leading to a fix'), it's among else
necessary that the issue can be narrowed down enough.  Sometimes this is
a quick process; e.g. user X finds a very specific driver bug while
using a patched and old kernel, driver developer Y takes the time to
confirm this bug in a recent mainline kernel because he already had a
good idea where to look and how to recreate the respective conditions,
and fixes the bug.  Sometimes it takes much much more work to identify
the circumstances of the bug.  It is then necessary that the reporter
knows exactly what he is running, simplifies his system to eliminate as
many potential causes for problems as possible, and always clearly
states under what circumstances the bug happens.

If you already found the bug in an untainted (but patched?) kernel, then
what information does another report against a tainted kernel add?  The
tainted kernel has more unknowns than the untainted one.  Progress can
only be made if the number of unknowns are successively reduced.

Regarding other people's reports and hypocrisy and whatnot:  I myself am
monitoring a few distro bug trackers more or less frequently for bug
reports concerning the kernel subsystem I'm interested in.  With varying
success though.  In order make use of a report against a distro kernel,
I need to have a good picture of what stuff is in that kernel.  Looking
at distro bug trackers does only work for me because my field of
interest is a driver subsystem which is somewhat decoupled from other
kernel parts; so if there is trouble concerning hardware covered by this
subsystem, it is usually not too hard to figure out whether the problem
is in this subsystem or somewhere else.  If it weren't that easy most of
the time, I might for example depend on the reporters to test specific
mainline kernels or specific development kernels.  (Though the latter
becomes necessary after all in cases when more targeted debug output is
needed from the reporter, or in order to test proposed fixes without
having to wait for the distributor to build a test package for the
reporter.)
-- 
Stefan Richter
-=-=-=== ==-- =-=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, Ingo Molnar wrote:
 * Hemmann, Volker Armin [EMAIL PROTECTED] wrote:
  On Donnerstag, 20. Dezember 2007, you wrote:
   Hemmann, Volker Armin wrote:
[ 5194.131014] Pid: 22490, comm: sleep Tainted: P   
2.6.23.11reiser4 #4
  
   The subject line is wrong.
   You apparently run Linux, but not Linux 2.6.23.y.
 
  first of all, apart from this oops all other oopses I reported were
  with a not-tainted kernel. You might want to read the other mails I
  have sent.
 
  Also, besides of the reiser4 patch there is no other patch added to
  the kernel. And since people have had successfully reported problems
  with heavily distro-patched kernels in the past it looks a little bit
  hypocritical to put my reports aside because of one single patch -
  don't you think?

 reiser4 isnt just a single random patch, it's a huge patch with lots of
 interactions with file and memory management. Would it be hard for you
 to reproduce the crash without reiser4? (or is all your stuff on
 reiser4?)

/home (and /var, /tmp) is on reiser4 and my biggest partition. And since it 
needs up to 3 days to reproduce this - yes, hard to do without r4.

Glück Auf,
 Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Pekka Enberg

Hi,

On Dec 20, 2007 4:38 PM, David Newall [EMAIL PROTECTED] wrote:
  and another one, this time tainted with the nvidia module:
  5194.130985] Unable to handle kernel paging request at 0300
  RIP:

 Numbers like that don't suggest hardware faults.  All those zeros: It's
 far too round.  Sounds very like software.  In fact, it sounds like the
 start of significant hardware region.

Nah, it's just that vma-anon_vma is probably supposed to be NULL here. And if
you look at all the oopses, they do suggest one particular byte lane
is dodgy (the
corruption is in bits 41-43 and 45).

The whole thing reminds me of another bug where memtest86 didn't find anything
because it's doing cached memory accesses: http://lkml.org/lkml/2007/10/3/259

Pekka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Ingo Molnar


* Pekka Enberg [EMAIL PROTECTED] wrote:

 Nah, it's just that vma-anon_vma is probably supposed to be NULL 
 here. And if you look at all the oopses, they do suggest one 
 particular byte lane is dodgy (the corruption is in bits 41-43 and 
 45).
 
 The whole thing reminds me of another bug where memtest86 didn't find 
 anything because it's doing cached memory accesses: 
 http://lkml.org/lkml/2007/10/3/259

memtest86+ has an uncached test:

const struct tseq tseq[] = {
{1,  5,  3,   0, 0, [Address test, walking ones]  },
{1,  6,  3,   2, 0, [Address test, own address]   },
{1,  0,  3,  14, 0, [Moving inversions, ones  zeros] },
{1,  1,  2,  80, 0, [Moving inversions, 8 bit pattern]},
{1, 10, 60, 300, 0, [Moving inversions, random pattern]   },
{1,  7, 64,  66, 0, [Block move, 64 moves]},
{1,  2,  2, 320, 0, [Moving inversions, 32 bit pattern]   },
{1,  9, 40, 120, 0, [Random number sequence]  },
{1,  3,  4, 240, 0, [Modulo 20, ones  zeros] },
{1,  8,  1,   2, 0, [Bit fade test, 90 min, 2 patterns]   },
{0,  4,  3,   2, 0, [[Moving inversions, 0  1, uncached] },
{0,  0,  0,   0, 0, NULL}
};

find that Moving inversions, 0  1 test and run that one alone, 
overnight.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Hemmann, Volker Armin

Ok, so after the holidays I will do the following:

let memtest86+ run several hours.
do a full backuprestore to switch to r3 and build an unpatched kernel.
see if I can reproduce the oops with .21 and .22 (because AFAIR no oops with 
21.. but I might be wrong).

Not exactly in that order.

Glück Auf
Volker


ps: please cc me. I am not subscribed to lkml.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-20 Thread Mike Galbraith


On Thu, 2007-12-20 at 19:14 +0100, Hemmann, Volker Armin wrote:

 It is just.. I could be the hardware - but I should have seen the 
 same 'problem' with earlier kernels - and the 'almost daily oops' only 
 started with 2.6.23.

Nonetheless, the oopsen _suggest_ hardware.  If it were my box, I'd move
ram modules as a first step.  It costs about two minutes to eliminate
that possibility, but you seem reluctant to take that step.  Heck, I'd
_hope_ it's something as simple bad ram, because otherwise, quest for
stability could become a time consuming and/or expensive undertaking...

If that didn't change anything, I'd go back and stress test a previously
stable configuration to gain confidence in my hardware.  If 'uhoh, not
as stable as I thought' happened, and nothing is getting obviously hot
[1], I'd pray that it's an electrically noisy power supply, because
that's also easy and cheap.  In any case, once I was very very confident
that my hardware was indeed sound, I'd move on to an agonizingly tedious
bisection, with no out of tree modules ever loaded, to narrow down when
this memory corruption that nobody else appears to be hitting appeared.

-Mike

1.  Crappy heatsink compound can dry out and fracture, leaving hot chip
under a relatively cool heatsink.  This is exactly what I found when I
disassembled my suddenly unstable under heavy load P4 box a while back.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-19 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, you wrote:
> On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote:
> > On Montag, 17. Dezember 2007, you wrote:
> >
> > and another one, this time tainted with the nvidia module:
> > 5194.130985] Unable to handle kernel paging request at 0300
> > RIP:
>
> This really sounds like bad hardware. Either memory or the mobo/riser
> card the memory is on. You might try lowering the memory timings of your
> memory in BIOS. Try removing 1/2 of your memory. If it still remove the
> other 1/2 and put the first 1/2 back and try again.

if this is bad hardware why:

- didn't this show up earlier?

- did a several hour memtest run couple of weeks ago didn't show up anything?

- and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc finish 
without any problems?

If it would be bad hardware, I should see segfaults left and right, right? but 
I don't see them.  In fact, apart from the oopses the system works fine - 
even with the oopses the system works fine, apart from the occasional stuck 
ps aux

And this messages:
[41160.823959] kio_http_cache_[25229] general protection rip:32621f1fe9 
rsp:7fff59a3d270 error:0
show up on closing konqueror tabs/Konqueror. There are no surprising exits, no 
apps vanishing. 

But I will run memtest86+ (or should I use memtest86?).

Glück Auf,
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-19 Thread Scott

On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote:
> On Montag, 17. Dezember 2007, you wrote:
> 
> and another one, this time tainted with the nvidia module:
> 5194.130985] Unable to handle kernel paging request at 0300 RIP:

This really sounds like bad hardware. Either memory or the mobo/riser
card the memory is on. You might try lowering the memory timings of your
memory in BIOS. Try removing 1/2 of your memory. If it still remove the
other 1/2 and put the first 1/2 back and try again.

-- 
Scott <[EMAIL PROTECTED]>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-19 Thread Hemmann, Volker Armin

On Montag, 17. Dezember 2007, you wrote:

and another one, this time tainted with the nvidia module:
5194.130985] Unable to handle kernel paging request at 0300 RIP:
[ 5194.130988]  [] _spin_lock+0x0/0xf
[ 5194.130993] PGD 0
[ 5194.130994] Oops: 0002 [1] SMP
[ 5194.130996] CPU 1
[ 5194.130997] Modules linked in: rfcomm l2cap hci_usb bluetooth snd_usb_audio 
ohci1394 snd_usb_lib ieee1394 aic7xxx i2c_nforce2 nvidia(P) k8temp w83627ehf 
hwmon_vid hwmon i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth 
snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss 
snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus 
snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd 
r8169
[ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4
[ 5194.131015] RIP: 0010:[]  [] 
_spin_lock+0x0/0xf
[ 5194.131018] RSP: 0018:81009278be70  EFLAGS: 00010206
[ 5194.131020] RAX: 2ab90bfb5000 RBX: 810117d44db0 RCX: 
2ab90bdb5000
[ 5194.131021] RDX: 81011519f810 RSI: 00388aa08fff RDI: 
0300
[ 5194.131023] RBP: 0300 R08: 81012f190ea0 R09: 

[ 5194.131024] R10: 0008 R11: 0246 R12: 
810117d44db0
[ 5194.131026] R13: 2ab90bdb R14:  R15: 

[ 5194.131028] FS:  2ab90bde3070() GS:81012fc6cec0() 
knlGS:f7f756c0
[ 5194.131030] CS:  0010 DS:  ES:  CR0: 8005003b
[ 5194.131031] CR2: 0300 CR3: 93605000 CR4: 
06e0
[ 5194.131033] DR0:  DR1:  DR2: 

[ 5194.131034] DR3:  DR6: 0ff0 DR7: 
0400
[ 5194.131036] Process sleep (pid: 22490, threadinfo 81009278a000, task 
8100960630c0)
[ 5194.131037] Stack:  8026afbc 810117d44db0 810115e2cbb8 
810117d44db0
[ 5194.131040]  80265ec3 81009278bee0 81009278bee0 
810115e2c3d8
[ 5194.131043]  8100a076cb80 0002 7fff9ecf7808 
7fff9ecf7810
[ 5194.131045] Call Trace:
[ 5194.131048]  [] anon_vma_unlink+0x1a/0x64
[ 5194.131051]  [] free_pgtables+0x64/0xc4
[ 5194.131054]  [] exit_mmap+0x91/0xeb
[ 5194.131057]  [] mmput+0x28/0xa0
[ 5194.131060]  [] do_exit+0x211/0x786
[ 5194.131063]  [] sys_exit_group+0x0/0xe
[ 5194.131065]  [] system_call+0x7e/0x83
[ 5194.131069]
[ 5194.131070]
[ 5194.131070] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 
00
[ 5194.131076] RIP  [] _spin_lock+0x0/0xf
[ 5194.131078]  RSP 
[ 5194.131079] CR2: 0300
[ 5194.131101] Fixing recursive fault but reboot is needed!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-19 Thread Hemmann, Volker Armin

On Montag, 17. Dezember 2007, you wrote:

and another one, this time tainted with the nvidia module:
5194.130985] Unable to handle kernel paging request at 0300 RIP:
[ 5194.130988]  [804449fa] _spin_lock+0x0/0xf
[ 5194.130993] PGD 0
[ 5194.130994] Oops: 0002 [1] SMP
[ 5194.130996] CPU 1
[ 5194.130997] Modules linked in: rfcomm l2cap hci_usb bluetooth snd_usb_audio 
ohci1394 snd_usb_lib ieee1394 aic7xxx i2c_nforce2 nvidia(P) k8temp w83627ehf 
hwmon_vid hwmon i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth 
snd_seq_virmidi snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss 
snd_seq_midi_event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus 
snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd 
r8169
[ 5194.131014] Pid: 22490, comm: sleep Tainted: P2.6.23.11reiser4 #4
[ 5194.131015] RIP: 0010:[804449fa]  [804449fa] 
_spin_lock+0x0/0xf
[ 5194.131018] RSP: 0018:81009278be70  EFLAGS: 00010206
[ 5194.131020] RAX: 2ab90bfb5000 RBX: 810117d44db0 RCX: 
2ab90bdb5000
[ 5194.131021] RDX: 81011519f810 RSI: 00388aa08fff RDI: 
0300
[ 5194.131023] RBP: 0300 R08: 81012f190ea0 R09: 

[ 5194.131024] R10: 0008 R11: 0246 R12: 
810117d44db0
[ 5194.131026] R13: 2ab90bdb R14:  R15: 

[ 5194.131028] FS:  2ab90bde3070() GS:81012fc6cec0() 
knlGS:f7f756c0
[ 5194.131030] CS:  0010 DS:  ES:  CR0: 8005003b
[ 5194.131031] CR2: 0300 CR3: 93605000 CR4: 
06e0
[ 5194.131033] DR0:  DR1:  DR2: 

[ 5194.131034] DR3:  DR6: 0ff0 DR7: 
0400
[ 5194.131036] Process sleep (pid: 22490, threadinfo 81009278a000, task 
8100960630c0)
[ 5194.131037] Stack:  8026afbc 810117d44db0 810115e2cbb8 
810117d44db0
[ 5194.131040]  80265ec3 81009278bee0 81009278bee0 
810115e2c3d8
[ 5194.131043]  8100a076cb80 0002 7fff9ecf7808 
7fff9ecf7810
[ 5194.131045] Call Trace:
[ 5194.131048]  [8026afbc] anon_vma_unlink+0x1a/0x64
[ 5194.131051]  [80265ec3] free_pgtables+0x64/0xc4
[ 5194.131054]  [80267174] exit_mmap+0x91/0xeb
[ 5194.131057]  [80230191] mmput+0x28/0xa0
[ 5194.131060]  [802353db] do_exit+0x211/0x786
[ 5194.131063]  [802359cf] sys_exit_group+0x0/0xe
[ 5194.131065]  [8020b66e] system_call+0x7e/0x83
[ 5194.131069]
[ 5194.131070]
[ 5194.131070] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 
00
[ 5194.131076] RIP  [804449fa] _spin_lock+0x0/0xf
[ 5194.131078]  RSP 81009278be70
[ 5194.131079] CR2: 0300
[ 5194.131101] Fixing recursive fault but reboot is needed!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-19 Thread Scott


On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote:
 On Montag, 17. Dezember 2007, you wrote:
 
 and another one, this time tainted with the nvidia module:
 5194.130985] Unable to handle kernel paging request at 0300 RIP:

This really sounds like bad hardware. Either memory or the mobo/riser
card the memory is on. You might try lowering the memory timings of your
memory in BIOS. Try removing 1/2 of your memory. If it still remove the
other 1/2 and put the first 1/2 back and try again.

-- 
Scott [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-19 Thread Hemmann, Volker Armin

On Donnerstag, 20. Dezember 2007, you wrote:
 On Thu, 2007-12-20 at 03:13 +0100, Hemmann, Volker Armin wrote:
  On Montag, 17. Dezember 2007, you wrote:
 
  and another one, this time tainted with the nvidia module:
  5194.130985] Unable to handle kernel paging request at 0300
  RIP:

 This really sounds like bad hardware. Either memory or the mobo/riser
 card the memory is on. You might try lowering the memory timings of your
 memory in BIOS. Try removing 1/2 of your memory. If it still remove the
 other 1/2 and put the first 1/2 back and try again.

if this is bad hardware why:

- didn't this show up earlier?

- did a several hour memtest run couple of weeks ago didn't show up anything?

- and does stuff like compiling all of kde 3.5.8 or the latest kde4 rc finish 
without any problems?

If it would be bad hardware, I should see segfaults left and right, right? but 
I don't see them.  In fact, apart from the oopses the system works fine - 
even with the oopses the system works fine, apart from the occasional stuck 
ps aux

And this messages:
[41160.823959] kio_http_cache_[25229] general protection rip:32621f1fe9 
rsp:7fff59a3d270 error:0
show up on closing konqueror tabs/Konqueror. There are no surprising exits, no 
apps vanishing. 

But I will run memtest86+ (or should I use memtest86?).

Glück Auf,
Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-17 Thread Hemmann, Volker Armin

On Montag, 17. Dezember 2007, you wrote:
> On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote:
> > I got another crash, now with 2.6.23.11 on logout from KDE (two
> > differences, new kernel, 4gb ram instead of 2gb):

> > also I got some strange message yesterday before increasing ramsize:
> > [19546.639528] swap_free: Bad swap offset entry 0400
> > [27999.370777] swap_free: Bad swap offset entry 0400
> > [27999.434282] swap_free: Bad swap offset entry 0400
> > [27999.466035] swap_free: Bad swap offset entry 0400
> > [27999.521132] swap_free: Bad swap offset entry 0400
> > [27999.561621] VM: killing process ld-linux-x86-64
> > [27999.561719] swap_free: Bad swap offset entry 0400
>
> You're seeing a single bit set where it shouldn't be: please give
> memtest86+ a good try; if it's not actually your memory that's bad,
> then I'd guess it's something like overheating (please correct me,
> ye who know better).
>
> Hugh

first of all, the 2 with which I was seeing that have had their memtest run 
for some hours some weeks ago, without problems. I can compile stuff - like 
the latest kde4 rc without segfaults or problems (except when the oops is 
happening), and this mess only started recently. To be more correct:
the swap-mess only started with 2.6.23.11. With 2.6.23.9 I get the kio_http... 
rip's, but no swap related messages.

Overheating is very unlikely. I made sure that my computer is very well 
cooled. Even under high load I get something like 50°C from lmsensors and 
bios - and the errors are completly unrelated to load. Or temperature. 
Without load my cpu idles at ~30°C. Again, lmsensors and bios are very close 
about that.

Glück Auf,
Volker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-17 Thread Hugh Dickins

On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote:
> 
> I got another crash, now with 2.6.23.11 on logout from KDE (two differences, 
> new kernel, 4gb ram instead of 2gb):
> 
> [ 1771.063731] Unable to handle kernel paging request at 0400 RIP:
> also I got some strange message yesterday before increasing ramsize:
> [19546.639528] swap_free: Bad swap offset entry 0400
> [27999.370777] swap_free: Bad swap offset entry 0400
> [27999.434282] swap_free: Bad swap offset entry 0400
> [27999.466035] swap_free: Bad swap offset entry 0400
> [27999.521132] swap_free: Bad swap offset entry 0400
> [27999.561621] VM: killing process ld-linux-x86-64
> [27999.561719] swap_free: Bad swap offset entry 0400

You're seeing a single bit set where it shouldn't be: please give
memtest86+ a good try; if it's not actually your memory that's bad,
then I'd guess it's something like overheating (please correct me,
ye who know better).

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-17 Thread Hemmann, Volker Armin

Hi.

I got another crash, now with 2.6.23.11 on logout from KDE (two differences, 
new kernel, 4gb ram instead of 2gb):

[ 1771.063731] Unable to handle kernel paging request at 0400 RIP:
[ 1771.063735]  [] _spin_lock+0x0/0xf
[ 1771.063740] PGD 0
[ 1771.063741] Oops: 0002 [1] SMP
[ 1771.063743] CPU 0
[ 1771.063744] Modules linked in: k8temp w83627ehf hwmon_vid hwmon i2c_core 
snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi 
snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event 
snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm 
snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd r8169
[ 1771.063756] Pid: 4418, comm: kdm Not tainted 2.6.23.11reiser4 #1
[ 1771.063758] RIP: 0010:[]  [] 
_spin_lock+0x0/0xf
[ 1771.063760] RSP: 0018:81012937de10  EFLAGS: 00010206
[ 1771.063762] RAX: 81012bd78870 RBX: 0400 RCX: 

[ 1771.063764] RDX:  RSI: 81012c549e58 RDI: 
0400
[ 1771.063765] RBP: 81012bd78870 R08: 80012c52c045 R09: 
0005
[ 1771.063767] R10: 8100050df9f8 R11: 0002 R12: 
8101280c3760
[ 1771.063768] R13: 81012f05fac0 R14: 81012c549db0 R15: 
81012f05fac0
[ 1771.063770] FS:  2b438009bb40() GS:80533000() 
knlGS:
[ 1771.063772] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1771.063773] CR2: 0400 CR3: 00012d689000 CR4: 
06e0
[ 1771.063775] DR0:  DR1:  DR2: 

[ 1771.063776] DR3:  DR6: 0ff0 DR7: 
0400
[ 1771.063778] Process kdm (pid: 4418, threadinfo 81012937c000, task 
81012fca7860)
[ 1771.063779] Stack:  8026b084 81012ba17138 81012bd78870 

[ 1771.063782]  80230d0c 2b438009bbd0  
81012937df58
[ 1771.063785]  7fff2aa3d9d0 01200011  
8101280c3760
[ 1771.063787] Call Trace:
[ 1771.063790]  [] anon_vma_link+0x1a/0x40
[ 1771.063793]  [] copy_process+0xb03/0x1301
[ 1771.063798]  [] do_fork+0xb1/0x1fc
[ 1771.063802]  [] recalc_sigpending+0xe/0x25
[ 1771.063804]  [] system_call+0x7e/0x83
[ 1771.063806]  [] ptregscall_common+0x67/0xb0
[ 1771.063810]
[ 1771.063811]
[ 1771.063811] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 
00
[ 1771.063816] RIP  [] _spin_lock+0x0/0xf
[ 1771.063819]  RSP 
[ 1771.063820] CR2: 0400

also I got some strange message yesterday before increasing ramsize:
19546.639528] swap_free: Bad swap offset entry 0400
[19733.026587] kio_http_cache_[9814] general protection rip:3919ff1fe9 
rsp:7fff7e1b59f0 error:0

I did swapoff - a, mkswap /dev/sda, swapon -a:
[20105.297668] Adding 1951888k swap on /dev/sda2.  Priority:-2 extents:1 
across:1951888k
[21013.797335] kio_http_cache_[10921] general protection rip:3919ff1fe9 
rsp:7fff39a6d2a0 error:0
[22381.409172] kio_http_cache_[11459] general protection rip:3919ff1fe9 
rsp:7fffd84c4d00 error:0
[23877.759927] kio_http_cache_[11959] general protection rip:3919ff1fe9 
rsp:7fff9895c190 error:0
[25080.581142] kio_http_cache_[13146] general protection rip:3919ff1fe9 
rsp:7fff790e0920 error:0
[26483.315522] kio_http_cache_[13746] general protection rip:3919ff1fe9 
rsp:7fff51933170 error:0
[27696.301584] kio_http_cache_[14417] general protection rip:3919ff1fe9 
rsp:7fff8f38abc0 error:0
[27999.370777] swap_free: Bad swap offset entry 0400
[27999.434282] swap_free: Bad swap offset entry 0400
[27999.466035] swap_free: Bad swap offset entry 0400
[27999.521132] swap_free: Bad swap offset entry 0400
[27999.561621] VM: killing process ld-linux-x86-64
[27999.561719] swap_free: Bad swap offset entry 0400

complete dmesg:
[0.00] Linux version 2.6.23.11reiser4 ([EMAIL PROTECTED]) (gcc version 
4.2.2 
(Gentoo 4.2.2 p1.0)) #1 SMP Sun Dec 16 05:14:21 CET 2007
[0.00] Command line: root=/dev/sda3 nmi_watchdog=0
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e6000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - cffb (usable)
[0.00]  BIOS-e820: cffb - cffc (ACPI data)
[0.00]  BIOS-e820: cffc - cfff (ACPI NVS)
[0.00]  BIOS-e820: cfff - d000 (reserved)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: fee0 - fef0 (reserved)
[0.00]  BIOS-e820: ff38 - 0001 (reserved)
[0.00]  BIOS-e820: 0001 - 00013000 (usable)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256,

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-17 Thread Hemmann, Volker Armin

Hi.

I got another crash, now with 2.6.23.11 on logout from KDE (two differences, 
new kernel, 4gb ram instead of 2gb):

[ 1771.063731] Unable to handle kernel paging request at 0400 RIP:
[ 1771.063735]  [8044256a] _spin_lock+0x0/0xf
[ 1771.063740] PGD 0
[ 1771.063741] Oops: 0002 [1] SMP
[ 1771.063743] CPU 0
[ 1771.063744] Modules linked in: k8temp w83627ehf hwmon_vid hwmon i2c_core 
snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi 
snd_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event 
snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm 
snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd r8169
[ 1771.063756] Pid: 4418, comm: kdm Not tainted 2.6.23.11reiser4 #1
[ 1771.063758] RIP: 0010:[8044256a]  [8044256a] 
_spin_lock+0x0/0xf
[ 1771.063760] RSP: 0018:81012937de10  EFLAGS: 00010206
[ 1771.063762] RAX: 81012bd78870 RBX: 0400 RCX: 

[ 1771.063764] RDX:  RSI: 81012c549e58 RDI: 
0400
[ 1771.063765] RBP: 81012bd78870 R08: 80012c52c045 R09: 
0005
[ 1771.063767] R10: 8100050df9f8 R11: 0002 R12: 
8101280c3760
[ 1771.063768] R13: 81012f05fac0 R14: 81012c549db0 R15: 
81012f05fac0
[ 1771.063770] FS:  2b438009bb40() GS:80533000() 
knlGS:
[ 1771.063772] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1771.063773] CR2: 0400 CR3: 00012d689000 CR4: 
06e0
[ 1771.063775] DR0:  DR1:  DR2: 

[ 1771.063776] DR3:  DR6: 0ff0 DR7: 
0400
[ 1771.063778] Process kdm (pid: 4418, threadinfo 81012937c000, task 
81012fca7860)
[ 1771.063779] Stack:  8026b084 81012ba17138 81012bd78870 

[ 1771.063782]  80230d0c 2b438009bbd0  
81012937df58
[ 1771.063785]  7fff2aa3d9d0 01200011  
8101280c3760
[ 1771.063787] Call Trace:
[ 1771.063790]  [8026b084] anon_vma_link+0x1a/0x40
[ 1771.063793]  [80230d0c] copy_process+0xb03/0x1301
[ 1771.063798]  [80231670] do_fork+0xb1/0x1fc
[ 1771.063802]  [8023aa56] recalc_sigpending+0xe/0x25
[ 1771.063804]  [8020b66e] system_call+0x7e/0x83
[ 1771.063806]  [8020b987] ptregscall_common+0x67/0xb0
[ 1771.063810]
[ 1771.063811]
[ 1771.063811] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 
00
[ 1771.063816] RIP  [8044256a] _spin_lock+0x0/0xf
[ 1771.063819]  RSP 81012937de10
[ 1771.063820] CR2: 0400

also I got some strange message yesterday before increasing ramsize:
19546.639528] swap_free: Bad swap offset entry 0400
[19733.026587] kio_http_cache_[9814] general protection rip:3919ff1fe9 
rsp:7fff7e1b59f0 error:0

I did swapoff - a, mkswap /dev/sda, swapon -a:
[20105.297668] Adding 1951888k swap on /dev/sda2.  Priority:-2 extents:1 
across:1951888k
[21013.797335] kio_http_cache_[10921] general protection rip:3919ff1fe9 
rsp:7fff39a6d2a0 error:0
[22381.409172] kio_http_cache_[11459] general protection rip:3919ff1fe9 
rsp:7fffd84c4d00 error:0
[23877.759927] kio_http_cache_[11959] general protection rip:3919ff1fe9 
rsp:7fff9895c190 error:0
[25080.581142] kio_http_cache_[13146] general protection rip:3919ff1fe9 
rsp:7fff790e0920 error:0
[26483.315522] kio_http_cache_[13746] general protection rip:3919ff1fe9 
rsp:7fff51933170 error:0
[27696.301584] kio_http_cache_[14417] general protection rip:3919ff1fe9 
rsp:7fff8f38abc0 error:0
[27999.370777] swap_free: Bad swap offset entry 0400
[27999.434282] swap_free: Bad swap offset entry 0400
[27999.466035] swap_free: Bad swap offset entry 0400
[27999.521132] swap_free: Bad swap offset entry 0400
[27999.561621] VM: killing process ld-linux-x86-64
[27999.561719] swap_free: Bad swap offset entry 0400

complete dmesg:
[0.00] Linux version 2.6.23.11reiser4 ([EMAIL PROTECTED]) (gcc version 
4.2.2 
(Gentoo 4.2.2 p1.0)) #1 SMP Sun Dec 16 05:14:21 CET 2007
[0.00] Command line: root=/dev/sda3 nmi_watchdog=0
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e6000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - cffb (usable)
[0.00]  BIOS-e820: cffb - cffc (ACPI data)
[0.00]  BIOS-e820: cffc - cfff (ACPI NVS)
[0.00]  BIOS-e820: cfff - d000 (reserved)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: fee0 - fef0 (reserved)
[0.00]  BIOS-e820: ff38 - 0001 (reserved)
[0.00]

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-17 Thread Hugh Dickins

On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote:
 
 I got another crash, now with 2.6.23.11 on logout from KDE (two differences, 
 new kernel, 4gb ram instead of 2gb):
 
 [ 1771.063731] Unable to handle kernel paging request at 0400 RIP:
 also I got some strange message yesterday before increasing ramsize:
 [19546.639528] swap_free: Bad swap offset entry 0400
 [27999.370777] swap_free: Bad swap offset entry 0400
 [27999.434282] swap_free: Bad swap offset entry 0400
 [27999.466035] swap_free: Bad swap offset entry 0400
 [27999.521132] swap_free: Bad swap offset entry 0400
 [27999.561621] VM: killing process ld-linux-x86-64
 [27999.561719] swap_free: Bad swap offset entry 0400

You're seeing a single bit set where it shouldn't be: please give
memtest86+ a good try; if it's not actually your memory that's bad,
then I'd guess it's something like overheating (please correct me,
ye who know better).

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: almost daily Kernel oops with 2.6.23.9 - and now 2.6.23.11 as well

2007-12-17 Thread Hemmann, Volker Armin

On Montag, 17. Dezember 2007, you wrote:
 On Mon, 17 Dec 2007, Hemmann, Volker Armin wrote:
  I got another crash, now with 2.6.23.11 on logout from KDE (two
  differences, new kernel, 4gb ram instead of 2gb):

  also I got some strange message yesterday before increasing ramsize:
  [19546.639528] swap_free: Bad swap offset entry 0400
  [27999.370777] swap_free: Bad swap offset entry 0400
  [27999.434282] swap_free: Bad swap offset entry 0400
  [27999.466035] swap_free: Bad swap offset entry 0400
  [27999.521132] swap_free: Bad swap offset entry 0400
  [27999.561621] VM: killing process ld-linux-x86-64
  [27999.561719] swap_free: Bad swap offset entry 0400

 You're seeing a single bit set where it shouldn't be: please give
 memtest86+ a good try; if it's not actually your memory that's bad,
 then I'd guess it's something like overheating (please correct me,
 ye who know better).

 Hugh

first of all, the 2 with which I was seeing that have had their memtest run 
for some hours some weeks ago, without problems. I can compile stuff - like 
the latest kde4 rc without segfaults or problems (except when the oops is 
happening), and this mess only started recently. To be more correct:
the swap-mess only started with 2.6.23.11. With 2.6.23.9 I get the kio_http... 
rip's, but no swap related messages.

Overheating is very unlikely. I made sure that my computer is very well 
cooled. Even under high load I get something like 50°C from lmsensors and 
bios - and the errors are completly unrelated to load. Or temperature. 
Without load my cpu idles at ~30°C. Again, lmsensors and bios are very close 
about that.

Glück Auf,
Volker
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

42 matches

Mail list logo